
I needed a translation pipeline that could handle long product documentation without costs scaling by the token, so I built a lightweight CLI tool on top of an LLM. This guide walks through the exact Python script I shipped, which runs on Oxlo.ai's flat per-request pricing to translate large blocks of text without metering input length. That matters when you are piping entire markdown guides or JSON locale files through the model. If you are integrating translation into an app or workflow, this should give you a working foundation in about thirty minutes.
What you'll need
You will need Python 3.10 or newer installed locally. Install the OpenAI SDK with pip install openai; it is the only dependency because Oxlo.ai exposes a fully compatible API, so no custom client code is required. You also need an Oxlo.ai API key, which you can generate at https://portal.oxlo.ai; the free tier includes 60 requests per day, enough to test this script thoroughly. Finally, grab a text or markdown file you want to translate so you have realistic input. I used a 400-word readme with mixed code fences and YAML front matter for my own tests.
Step 1: Initialize the Oxlo.ai Client
I start by importing the SDK and pointing the client at Oxlo.ai's OpenAI-compatible endpoint. I pull the API key from an environment variable so the credential never touches disk in the source code, which means I can share the script in version control without scrubbing secrets later. A one-line connectivity check confirms the endpoint and key are alive before I move on.
import os
from openai import OpenAI
client = OpenAI(
base_url="https://api.oxlo.ai/v1",
api_key=os.environ.get("OXLO_API_KEY")
)
# Quick connectivity check
response = client.chat.completions.create(
model="llama-3.3-70b",
messages=[{"role": "user", "content": "Reply 'ready' and nothing else."}],
)
print("Client status:", response.choices[0].message.content)
Step 2: Define the System Prompt
The system prompt is the only behavioral guardrail I need. It tells the model to act as a deterministic translation engine, preserve formatting, and output nothing except the translated text. Keeping this prompt strict prevents the model from adding polite introductions like "Here is the translation" before every response.
SYSTEM_PROMPT = """You are a deterministic translation engine.
Translate the user's text into the requested target language.
Preserve all markdown, code blocks, URLs, and named entities exactly as they appear.
Do not add explanations, preambles, or quotation marks around the output.
Return only the translated text."""
Step 3: Build the Core Translation Function
This function accepts a raw string and a target language, wraps them in the messages array, and calls Llama 3.3 70B through Oxlo.ai. I keep temperature at 0.1 so the output stays consistent across repeated runs, which is important if you later add caching. Llama 3.3 70B is a solid default for this task on Oxlo.ai, though you could swap in Qwen 3 32B if you are working with primarily Asian language pairs.
def translate(text: str, target_lang: str = "Spanish") -> str:
if not text.strip():
return ""
user_message = f"Translate the following text into {target_lang}:\n\n{text}"
response = client.chat.completions.create(
model="llama-3.3-70b",
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": user_message},
],
temperature=0.1,
)
return response.choices[0].message.content.strip()
Step 4: Auto-Detect the Source Language
I do not always know the input language, so I run a lightweight detection pass first. I ask the model to return an ISO 639-1 code and use Oxlo.ai's JSON mode to force a parseable object back. That avoids regex scraping of freeform text and keeps the function reliable. JSON mode is available on Oxlo.ai for models that support structured output, and it saves me from writing brittle parsing logic.
import json
def detect_language(text: str) -> str:
prompt = (
"Identify the ISO 639-1 language code of the following text. "
'Respond with valid JSON in this exact shape: {"language": "en"}.\n\n'
f"{text[:500]}"
)
response = client.chat.completions.create(
model="llama-3.3-70b",
messages=[{"role": "user", "content": prompt}],
response_format={"type": "json_object"},
temperature=0.0,
)
result = json.loads(response.choices[0].message.content)
return result.get("language", "unknown")
Step 5: Preserve Document Structure
Real documents contain headers, lists, and code fences. I split the input on double newlines, translate each prose paragraph individually, and skip anything that looks like a code block. This prevents the model from touching syntax, URLs, or indentation that must remain unchanged. Paragraph-level translation also keeps each request small enough to avoid timeouts, while Oxlo.ai's lack of cold starts means the loop runs without warmup delays.
def translate_document(text: str, target_lang: str = "Spanish") -> str:
paragraphs = text.split("\n\n")
translated = []
for paragraph in paragraphs:
stripped = paragraph.strip()
if stripped.startswith("```") or stripped.startswith(" "):
translated.append(paragraph)
continue
translated_para = translate(stripped, target_lang)
translated.append(translated_para)
return "\n\n".join(translated)
Step 6: Inject a Domain Glossary
For technical docs, some terms must stay in English. I append a dynamic instruction to the system prompt so the model receives the constraint fresh on every request. This works better than hoping the base prompt remembers domain vocabulary across arbitrary subjects. Because Oxlo.ai charges per request rather than per token, splitting by paragraph and sending glossary instructions does not inflate cost the way token-based metering would.
def translate_with_glossary(
text: str, target_lang: str, glossary: list[str]
) -> str:
glossary_instruction = (
"\nDo not translate the following terms: "
+ ", ".join(glossary)
+ ". Preserve them in the original language."
)
messages = [
{"role": "system", "content": SYSTEM_PROMPT + glossary_instruction},
{
"role": "user",
"content": f"Translate the following text into {target_lang}:\n\n{text}",
},
]
response = client.chat.completions.create(
model="llama-3.3-70b",
messages=messages,
temperature=0.1,
)
return response.choices[0].message.content.strip()
Step 7: Add a CLI Interface
I wire everything together with argparse so the script behaves like a standard Unix tool. It reads the file, detects the source language, translates the content while respecting any glossary, and prints the result to stdout. You can pipe the output directly to a new file or into a continuous integration job.
import argparse
import sys
def main():
parser = argparse.ArgumentParser(description="Translate documents via Oxlo.ai")
parser.add_argument("file", help="Path to the text or markdown file")
parser.add_argument("--target", "-t", default="Spanish", help="Target language")
parser.add_argument("--glossary", "-g", nargs="*", default=[], help="Terms to keep untranslated")
args = parser.parse_args()
with open(args.file, "r", encoding="utf-8") as f:
source_text = f.read()
detected = detect_language(source_text)
print(f"Detected source language: {detected}", file=sys.stderr)
if args.glossary:
result = translate_with_glossary(source_text, args.target, args.glossary)
else:
result = translate_document(source_text, args.target)
print(result)
if __name__ == "__main__":
main()
Run It
I saved the full script as translate.py and created a sample file readme.md with mixed prose and a shell code block. Running the command below detected English, preserved the fenced code, and returned clean Spanish prose. The entire run consumed only a handful of requests, which on Oxlo.ai means the cost is predictable no matter how long each paragraph grows.
$ export OXLO_API_KEY=oxlo_xxxxxxxx
$ python translate.py readme.md --target Spanish --glossary API webhook
Detected source language: en
# Guía de inicio rápido
Bienvenido a la plataforma. Para autenticar, envía una solicitud POST a nuestro API.
Tu webhook se activará dentro de cinco segundos.
```bash
curl -X POST https://api.example.com/v1/start \
-H "Authorization: Bearer $TOKEN"
```
Next Steps
To productionize this, I would add an SQLite cache keyed by a hash of the input paragraph so identical blocks never trigger duplicate requests. You could also wrap the script in a GitHub Action to auto-translate documentation on every pull request. Oxlo.ai's request-based pricing keeps these batch jobs predictable, because translating a fifty-paragraph document costs the same per request regardless of paragraph length, which is ideal for long-context workloads. If you need higher throughput or dedicated capacity, see the exact plans at https://oxlo.ai/pricing.

