
I built a test-prep tutor that generates practice questions, scores answers, and adapts difficulty based on mistakes. It runs on Oxlo.ai's request-based API, so I can feed it long reading passages and full session history without the cost scaling by token count. Here is how I put it together.
What you'll need
You will need Python 3.10 or newer, the OpenAI SDK, and an API key from Oxlo.ai. Install the SDK with pip and create your key in the Oxlo.ai portal. Because Oxlo.ai uses flat per-request pricing, you can experiment with long prompts and multi-turn sessions without watching token meters spin up. See the pricing page for plan details.
pip install openai
Step 1: Configure the client
I import the OpenAI SDK and point it at Oxlo.ai. This is the only client I need because Oxlo.ai is fully compatible with the OpenAI chat completions format. I pull the API key from an environment variable out of habit, and I like that Oxlo.ai offers no cold starts on popular models, so the first request after importing is just as fast as the tenth.
import os
from openai import OpenAI
client = OpenAI(
base_url="https://api.oxlo.ai/v1",
api_key=os.environ.get("OXLO_API_KEY", "YOUR_OXLO_API_KEY"),
)
Step 2: Write the system prompt
The system prompt is the only code that defines the tutor's personality. I keep it in a module-level constant so I can iterate on wording without touching business logic. I give the model five concrete rules so its output stays consistent across turns. When I tweak the prompt, I copy the old version to a git-tracked file so I can A/B tutor personalities later.
SYSTEM_PROMPT = """You are a concise test-prep tutor. Follow these rules exactly:
1. Ask exactly one clear question per turn.
2. After the student answers, judge it as correct, partially correct, or incorrect.
3. Explain the underlying concept in two sentences.
4. Suggest the next difficulty: easier, same, or harder.
5. Stay focused on the topic the student provided."""
Step 3: Generate the first question
I start the session by asking for a single question at a chosen difficulty. I intentionally keep the user message minimal. The system prompt already constrains behavior, so the user message only needs to supply the topic and difficulty level. This reduces the chance of prompt injection from the topic string. I use Llama 3.3 70B because it follows system instructions reliably, but you could swap in Qwen 3 32B or DeepSeek V3.2 without changing any other code.
def generate_question(topic, difficulty="medium"):
user_message = (
f"Topic: {topic}\n"
f"Difficulty: {difficulty}\n"
f"Generate one practice question. Do not answer it."
)
response = client.chat.completions.create(
model="llama-3.3-70b",
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": user_message},
],
)
return response.choices[0].message.content.strip()
Step 4: Evaluate the answer
Scoring is easy, but explaining the gap is what actually helps students learn. I send the student's answer back to the model with the original question and ask for a judgment, a two-sentence concept review, and a difficulty adjustment. This turns every mistake into a micro-lesson. I return the raw text and let the caller decide how to render it. In a production web app, I would parse the difficulty tag with a lightweight regex and update the UI badge separately.
def evaluate_answer(question, user_answer, topic):
user_message = (
f"Topic: {topic}\n"
f"Question: {question}\n"
f"Student answer: {user_answer}\n\n"
f"Judge the answer, explain the concept, and suggest the next difficulty."
)
response = client.chat.completions.create(
model="llama-3.3-70b",
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": user_message},
],
)
return response.choices[0].message.content.strip()
Step 5: Maintain session history
The real power of a tutor is memory. I keep the full conversation history in a Python list and pass the entire list on every request. On token-based providers, this would get expensive quickly because each new request reprocesses every previous token. Oxlo.ai charges a flat rate per request, so I do not have to implement sliding window truncation or token counting to stay within budget. The history list is the entire state of the session. That means I can serialize it to JSON, store it in Redis, and resume the session on another machine without losing context. Because Oxlo.ai is compatible with the OpenAI SDK, I do not need a custom client to resume either. If I ever need a massive context window for a full textbook chapter, I can switch the model to DeepSeek V4 Flash with its 1M token context and still pay per request.
class TestPrepSession:
def __init__(self, topic, model="llama-3.3-70b"):
self.topic = topic
self.model = model
self.history = [
{"role": "system", "content": SYSTEM_PROMPT},
]
self.difficulty = "medium"
def start(self):
self.history.append({
"role": "user",
"content": f"I am studying {self.topic}. Ask me a {self.difficulty} difficulty question."
})
return self._chat()
def submit_answer(self, answer):
self.history.append({"role": "user", "content": f"My answer: {answer}"})
feedback = self._chat()
# adapt difficulty
if "easier" in feedback.lower():
self.difficulty = "easy"
elif "harder" in feedback.lower():
self.difficulty = "hard"
# queue next question
self.history.append({
"role": "user",
"content": f"Give me the next question at {self.difficulty} difficulty."
})
return feedback
def next_question(self):
return self._chat()
def _chat(self):
response = client.chat.completions.create(
model=self.model,
messages=self.history,
)
content = response.choices[0].message.content.strip()
self.history.append({"role": "assistant", "content": content})
return content
Run it
The block below wires the session together. I initialize a topic, print the first question, simulate a weak answer on purpose so you can see the adaptation in action, and print the tutor's feedback and adapted follow-up question.
if __name__ == "__main__":
session = TestPrepSession("AWS S3 fundamentals")
question = session.start()
print("TUTOR:", question)
answer = "S3 is a relational database."
print("STUDENT:", answer)
feedback = session.submit_answer(answer)
print("TUTOR:", feedback)
next_q = session.next_question()
print("TUTOR:", next_q)
When I run the script, the output looks like this:
TUTOR: What is the primary storage architecture used by Amazon S3, and how does it differ from file-system storage?
STUDENT: S3 is a relational database.
TUTOR: Incorrect. Amazon S3 is an object storage service, not a relational database. It stores data as objects inside buckets, which makes it ideal for unstructured data and static assets rather than structured rows and tables. Suggested difficulty: easier.
TUTOR: True or false: Amazon S3 is designed primarily for storing unstructured objects such as images, videos, and backups, rather than for running complex SQL queries and transactions.
Wrap-up and next steps
This pattern works for any subject where the student benefits from repetition and feedback. Two concrete extensions I am shipping next are retrieval from a textbook PDF using Oxlo.ai's long-context models, and a SQLite-backed spaced-repetition scheduler that uses the difficulty tags to surface weak topics on a cadence. If you want to experiment, swap in a reasoning model like Kimi K2.6 or GLM 5 and see how the explanations change.

