I built a lightweight Socratic tutor that plugs into any EdTech stack to turn raw student questions into guided hints. It runs statelessly, so you can drop it behind a Flask endpoint or a Next.js API route without managing websocket state. In this tutorial, we will build the core logic with Python and Oxlo.ai, using the OpenAI SDK so the code looks familiar even if you have never touched Oxlo.ai before.

What you'll need

Python 3.10 or newer. I tested on 3.11, but anything from 3.10 up works.
The OpenAI Python SDK. Install it with pip install openai.
An Oxlo.ai API key from https://portal.oxlo.ai. The free tier includes 60 requests per day, which is enough to test this tutor with a full classroom of dummy accounts.
A model choice. I use llama-3.3-70b here because it follows formatting instructions tightly, but you can swap in qwen-3-32b for multilingual classes or deepseek-v3.2 for advanced STEM problems without changing any other code. See https://oxlo.ai/pricing for plan details.

Step 1: Design the tutoring prompt

The system prompt is the entire product. We are not fine-tuning. We are not building a vector RAG pipeline. We are simply telling the model what persona to adopt and what constraints to respect. I learned that giving the model an explicit output format makes parsing easier later. We want two things from every response: a level classification and a single hint. The level tag lets the frontend show a confidence badge or trigger extra resources. The hint must be exactly one question or one concrete sub-step, because students click away when they see a wall of text. I also cap the word count so the model stays terse. One subtle trick: I explicitly forbid giving the final answer. Without that rule, LLMs love to show off and solve the problem immediately, which kills learning.

SYSTEM_PROMPT = """You are a patient math tutor for high school students. Your rules:
1. Never give the final answer directly.
2. First, classify the student's understanding as [beginner], [intermediate], or [advanced] based on their work.
3. Then give exactly one hint or ask exactly one guiding question that moves them forward.
4. If the student is stuck at a definition, explain the concept in one sentence, then ask a follow-up.
5. Keep your response under 120 words.
6. End with a brief encouragement.

Format your response like this:
Level: [level]
Hint: [your hint or question]"""

Step 2: Set up the Oxlo.ai client

Oxlo.ai exposes a fully OpenAI-compatible endpoint, so the import line and method names are identical to what you already know. The only difference is the base URL and the API key. I set the client at module level so the function in Step 3 can reuse it. If you are running this in a web app, you would move this into a factory or dependency injector, but for a standalone script, a global client is fine. Because the endpoint is fully compatible, you can test locally with Oxlo.ai and move to another backend later without rewriting your orchestration. In practice, I have found no reason to switch, since Oxlo.ai carries over 45 models and request-based pricing keeps invoices predictable.

from openai import OpenAI

client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")

# Quick smoke test
response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[{"role": "user", "content": "Hello"}],
)
print(response.choices[0].message.content)

Step 3: Build the tutor function

Now we wrap the API call. I keep the function signature minimal: it takes the latest student message and an optional history list. Inside, we prepend the system prompt, extend with any prior turns, and append the new user message. I set temperature to 0.7 because we want creativity for hints but not so much that the model ignores the formatting rules. At 0.2 the hints become robotic and repetitive. At 1.0 the model sometimes blurts out the answer. 0.7 is the sweet spot for educational tone. Max tokens is 200, which is plenty for a short hint and well below the point where the model starts rambling. The function returns the raw assistant string. We do not parse it here; we leave that to the caller so the core stays generic.

from openai import OpenAI

client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")

SYSTEM_PROMPT = """You are a patient math tutor for high school students. Your rules:
1. Never give the final answer directly.
2. First, classify the student's understanding as [beginner], [intermediate], or [advanced] based on their work.
3. Then give exactly one hint or ask exactly one guiding question that moves them forward.
4. If the student is stuck at a definition, explain the concept in one sentence, then ask a follow-up.
5. Keep your response under 120 words.
6. End with a brief encouragement.

Format your response like this:
Level: [level]
Hint: [your hint or question]"""

def tutor_ask(student_message, history=None):
    if history is None:
        history = []
    
    messages = [{"role": "system", "content": SYSTEM_PROMPT}]
    messages.extend(history)
    messages.append({"role": "user", "content": student_message})
    
    response = client.chat.completions.create(
        model="llama-3.3-70b",
        messages=messages,
        temperature=0.7,
        max_tokens=200,
    )
    
    return response.choices[0].message.content

# Quick test
reply = tutor_ask("How do I solve 3x + 5 = 20?")
print(reply)

Step 4: Add conversation history

A stateless function is great for scaling, but a tutor without memory is useless. We add a simple REPL loop that accumulates messages in a Python list and passes it back into tutor_ask each turn. This mirrors how a real backend would store turns in Redis or Postgres and replay them on every request. Because Oxlo.ai uses request-based pricing, you can stuff that full history into every call and still pay the same flat per-request rate. That matters in EdTech, where a student might paste a three-paragraph word problem and three previous attempts. On token-based providers, that growing context window inflates the bill with every turn. With Oxlo.ai, the cost stays predictable regardless of how much prior context you include. That predictability makes it easier to offer unlimited tutoring sessions in a freemium product without surprise overages.

from openai import OpenAI

client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")

SYSTEM_PROMPT = """You are a patient math tutor for high school students. Your rules:
1. Never give the final answer directly.
2. First, classify the student's understanding as [beginner], [intermediate], or [advanced] based on their work.
3. Then give exactly one hint or ask exactly one guiding question that moves them forward.
4. If the student is stuck at a definition, explain the concept in one sentence, then ask a follow-up.
5. Keep your response under 120 words.
6. End with a brief encouragement.

Format your response like this:
Level: [level]
Hint: [your hint or question]"""

def tutor_ask(student_message, history):
    messages = [{"role": "system", "content": SYSTEM_PROMPT}]
    messages.extend(history)
    messages.append({"role": "user", "content": student_message})
    
    response = client.chat.completions.create(
        model="llama-3.3-70b",
        messages=messages,
        temperature=0.7,
        max_tokens=200,
    )
    
    assistant_msg = response.choices[0].message.content
    history.append({"role": "user", "content": student_message})
    history.append({"role": "assistant", "content": assistant_msg})
    return assistant_msg, history

def run_session():
    history = []
    print("Socratic Tutor (type 'exit' to quit)")
    while True:
        user_input = input("Student: ")
        if user_input.lower() in ("exit", "quit"):
            break
        reply, history = tutor_ask(user_input, history)
        print(f"Tutor: {reply}\n")

if __name__ == "__main__":
    run_session()

Run it

Save the final script as tutor.py, replace YOUR_OXLO_API_KEY with your real key, and execute it in your terminal.

python tutor.py

Here is what a real session looks like. Your exact wording will vary because of sampling, but the structure should match.

Socratic Tutor (type 'exit' to quit)
Student: How do I solve 3x + 5 = 20?
Tutor: 
Level: [beginner]
Hint: Start by isolating the term with x. What happens if you subtract 5 from both sides of the equation?
You've got this.

Student: I got 3x = 15.
Tutor: 
Level: [intermediate]
Hint: Exactly. Now, to find x, divide both sides by the coefficient. What is 15 divided by 3?
Keep going.

Student: x = 5.
Tutor: 
Level: [intermediate]
Hint: Great work. To be sure, plug 5 back into the original equation. Does 3 times 5 plus 5 equal 20?
Nice job today.

Next steps

This tutor is production-ready as a prototype, but two upgrades make it stick. First, parse the Level: tag with a regex and automatically append a micro-video or Khan Academy link when the level is beginner. Second, A/B test models by switching the model string to kimi-k2.6 or deepseek-v3.2 and measuring how many turns it takes a student to reach the correct answer. Oxlo.ai gives you over 45 models on the same API key, so swapping models is a one-line change. If you embed this behind an async FastAPI route, you can scale it horizontally without touching the LLM logic at all.

Using LLM for Education Technology

What you'll need

Step 1: Design the tutoring prompt

Step 2: Set up the Oxlo.ai client

Step 3: Build the tutor function

Step 4: Add conversation history

Run it

Next steps

Ready to build with Oxlo.ai?

Using LLM for Education Technology

What you'll need

Step 1: Design the tutoring prompt

Step 2: Set up the Oxlo.ai client

Step 3: Build the tutor function

Step 4: Add conversation history

Run it

Next steps

Related articles

Unlocking LLM Potential for Language Learning

Building a Corporate Training Tool with LLM

LLM for Corporate Training: A Guide

Revolutionizing Education with LLM: Opportunities and Challenges

Building Gaming Tools with LLM: A Step-by-Step Guide

Unlocking LLM Potential in Gaming

Ready to build with Oxlo.ai?