
We are going to build a primary-source analysis agent that ingests long historical or literary texts and returns structured JSON research notes. It is aimed at historians, literary scholars, and archivists who need to process letters, diaries, or manuscripts at scale. Because we are often feeding the model thousands of words of dense source material in a single request, Oxlo.ai's flat per-request pricing removes the usual token anxiety and makes long-context analysis predictable no matter how verbose the original document is. See https://oxlo.ai/pricing for details.
What you'll need
You need Python 3.10 or newer, the OpenAI SDK installed with pip install openai, and an Oxlo.ai API key from https://portal.oxlo.ai. Oxlo.ai's free tier includes 60 requests per day across more than a dozen models, so you can prototype this pipeline without entering payment details. Every model is served without cold starts, which means the first request after lunch returns just as fast as the hundredth.
Step 1: Configure the Oxlo.ai client
Oxlo.ai is a drop-in replacement for the OpenAI SDK. The only changes are the base URL and the API key. I pull the key from an environment variable so it never sits in the repo.
import os
from openai import OpenAI
client = OpenAI(
base_url="https://api.oxlo.ai/v1",
api_key=os.environ.get("OXLO_API_KEY")
)
Step 2: Define the system prompt
The system prompt is the contract between the researcher and the model. I force JSON output so the result can be fed directly into a database or a note-taking tool like Obsidian. The prompt asks for six specific fields that mirror the categories a graduate student would track when reading a source for the first time.
SYSTEM_PROMPT = """You are a humanities research assistant. Your job is to analyze a primary source text and return a JSON object with the following keys:
- summary: a concise overview of the text.
- key_figures: a list of people mentioned with their roles.
- historical_context: notes on events, dates, or references that require explanation.
- literary_devices: a list of techniques observed, if any.
- sentiment: the overall emotional tone.
- suggested_reading: a list of topics for further research.
Return only valid JSON. Do not wrap the output in markdown code fences."""
Step 3: Build the core analysis function
This function accepts a raw text string, forwards it to the model, and returns the parsed response. Keeping the interface limited to a single string makes it easy to test in a notebook or compose into larger pipelines later.
import json
def analyze_primary_source(text: str) -> dict:
response = client.chat.completions.create(
model="kimi-k2.6",
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": text},
],
)
raw = response.choices[0].message.content
return json.loads(raw)
Step 4: Enable JSON mode
Even though the prompt instructs the model to return JSON, I add response_format={"type": "json_object"} as an extra guardrail. This is especially useful when running long letters that might otherwise end with a trailing apology or markdown fence. Because Oxlo.ai does not charge by the token, I can paste a 15,000-word diary entry and still pay the same flat request cost.
def analyze_primary_source(text: str) -> dict:
response = client.chat.completions.create(
model="kimi-k2.6",
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": text},
],
response_format={"type": "json_object"},
)
raw = response.choices[0].message.content
return json.loads(raw)
Step 5: Load documents from disk
In practice, sources live as .txt or .md files in a folder. This helper reads a file and passes the string upstream. It uses UTF-8 encoding because archival texts often contain accented characters or ligatures.
def analyze_file(path: str) -> dict:
with open(path, "r", encoding="utf-8") as f:
text = f.read()
return analyze_primary_source(text)
Step 6: Add error handling
Even with JSON mode, models can occasionally hallucinate a trailing comma or a markdown fence. This wrapper catches JSONDecodeError, prints a debug snippet, and re-raises a clean exception so the pipeline does not silently fail.
def safe_analyze(text: str) -> dict:
try:
return analyze_primary_source(text)
except json.JSONDecodeError as e:
print("Model returned invalid JSON. Raw snippet:", text[:200])
raise RuntimeError("Parsing failed") from e
Step 7: Wrap it in a small CLI
A command-line interface lets me point the script at any file without editing the source. I use argparse because it is in the standard library and requires no extra dependencies.
import argparse
def safe_analyze_file(path: str) -> dict:
with open(path, "r", encoding="utf-8") as f:
text = f.read()
return safe_analyze(text)
def main():
parser = argparse.ArgumentParser(
description="Analyze a primary source with Oxlo.ai."
)
parser.add_argument("file", help="Path to a UTF-8 text file")
args = parser.parse_args()
result = safe_analyze_file(args.file)
print(json.dumps(result, indent=2))
if __name__ == "__main__":
main()
Run it
I created a file named letter_1940.txt containing the text below. Running python agent.py letter_1940.txt produces structured research notes instantly. The output is valid JSON, so you can pipe it straight into jq or import it into Zotero as annotations.
September 12, 1940
Dear Sir,
I write to you from the countryside where the harvest has been poor this season. The tenants are restless, and the old agreements made in 1928 no longer hold weight with the younger laborers. I have spoken with Mr. Ashford regarding the drainage dispute, but he remains steadfast in his claim to the north field. The tone of the village has turned sour, and I fear what may come of the next quarter.
Yours faithfully,
E. Blackwood
The agent returned the following JSON.
{
"summary": "A letter reporting agricultural distress, tenant unrest, and a boundary dispute between the writer and Mr. Ashford over the north field.",
"key_figures": [
{"name": "E. Blackwood", "role": "writer and landholder"},
{"name": "Mr. Ashford", "role": "neighbor in drainage dispute"}
],
"historical_context": "References to 1928 agreements suggest interwar agricultural tenancy laws and post-depression rural economic pressure.",
"literary_devices": ["formal epistolary structure", "foreshadowing in 'I fear what may come'"],
"sentiment": "anxious and cautionary",
"suggested_reading": ["British agricultural policy 1928-1940", "interwar land tenancy disputes", "rural labor movements in the 1940s"]
}
Next steps
This agent gives humanities researchers a repeatable first pass at any primary source. Two concrete directions to take it next. First, batch processing: wrap safe_analyze_file in a loop over a sources/ directory so an entire digitized archive can be processed overnight and the results appended to a SQLite database. Second, automated cataloging: add a second Oxlo.ai call that uses qwen-3-32b to generate Wikipedia search queries or Dewey Decimal classifications automatically from the suggested_reading list, turning raw text into library metadata without manual tagging.

