Guaranteed 15% off your current AI inference bill for team spending up to $20000 / month.

Book a call →
Back to Blogs
Learn AI

Building Conversational AI Systems for Language Learning with LLMs

We are going to build a conversational language tutor that maintains context across turns, corrects grammar gently, and adapts its complexity to the learner's...

Building Conversational AI Systems for Language Learning with LLMs

We are going to build a conversational language tutor that maintains context across turns, corrects grammar gently, and adapts its complexity to the learner's level. This is the core interaction layer for any edtech product that wants to offer realistic speaking practice without hiring human teachers. Most language apps drill vocabulary in isolation, but real fluency comes from open-ended dialogue that forces the learner to produce sentences and recover from errors. I have shipped versions of this for both mobile chat and voice channels, and the architecture stays the same regardless of the frontend. I will show you the exact code I use to run this on Oxlo.ai.

What you'll need

You do not need LangChain, LlamaIndex, or any other orchestration framework for the core loop. A plain Python class and the OpenAI SDK are enough because the model handles the pedagogical logic inside the system prompt. This keeps latency low and debugging simple.

  • Python 3.10 or newer
  • The OpenAI SDK: pip install openai
  • An Oxlo.ai API key from https://portal.oxlo.ai
  • A target language for testing. I will use Spanish in the examples, but you can swap it to any language supported by the model.

For exact plan details, see https://oxlo.ai/pricing.

Step 1: Set up the Oxlo.ai client

I load the API key from an environment variable instead of hardcoding it. Oxlo.ai uses standard HTTP Bearer authentication, so any OpenAI client works without extra plugins. The base URL is https://api.oxlo.ai/v1 and I will use qwen-3-32b because Oxlo.ai's request-based pricing means I can pass long multilingual system prompts without watching token counters tick up.

from openai import OpenAI
import os

client = OpenAI(
    base_url="https://api.oxlo.ai/v1",
    api_key=os.environ.get("OXLO_API_KEY")
)

MODEL = "qwen-3-32b"

Step 2: Craft the system prompt

I treat the prompt as configuration. The five rules below constrain the model to stay in the target language, correct gently without breaking flow, and keep the conversation moving. I also embed the proficiency level dynamically later, but the static rules live here. Notice that I explicitly forbid off-topic detours. Without this guardrail, open-ended LLMs will happily pivot to cooking recipes or politics when the user runs out of vocabulary.

SYSTEM_PROMPT = """You are a patient Spanish conversation tutor for English speakers.
Rules:
1. Always respond in Spanish, but keep grammar explanations in English.
2. If the user makes a mistake, repeat the corrected phrase inside brackets, then continue the conversation naturally.
3. Match the user's proficiency level. If they use simple sentences, use simple sentences back. If they use subjunctive, mirror that complexity.
4. Never break character to discuss politics, medical advice, or off-topic subjects.
5. Ask one follow-up question at the end of every response to keep the dialogue alive."""

Step 3: Build the conversation loop

The message list is the only state we need. I append the user's turn, send the full history to Oxlo.ai, and append the assistant's reply. Because Oxlo.ai charges a flat rate per request instead of per token, there is no penalty for sending the entire conversation on every turn. This simplifies architecture because we do not need a separate vector database or summarization step for short sessions. I return the assistant message so the caller can render it in a UI or forward it to a voice pipeline. The messages list is mutated in place, which is fine for a single-user script. If you deploy this behind an API, store the list in Redis keyed by session ID instead of keeping it in memory.

from openai import OpenAI

client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")

def get_tutor_response(messages, user_message):
    messages.append({"role": "user", "content": user_message})
    
    response = client.chat.completions.create(
        model="qwen-3-32b",
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            *messages
        ],
    )
    
    assistant_msg = response.choices[0].message.content
    messages.append({"role": "assistant", "content": assistant_msg})
    return assistant_msg

Step 4: Detect and tag proficiency

Self-reported levels are unreliable. A user who says they are intermediate might produce only present-tense sentences, or they might drop complex subjunctive clauses naturally. I use a quick classification call to kimi-k2.6 at the start of the session. The classifier reads the first message and returns a single word. I then inject that word into the system prompt for the rest of the session. This adds one extra request at startup, but on Oxlo.ai that is a fixed cost, not a token-weighted surcharge.

def detect_level(client, first_message):
    response = client.chat.completions.create(
        model="kimi-k2.6",
        messages=[
            {"role": "system", "content": "You are a CEFR level classifier. Output only one word: Beginner, Intermediate, or Advanced."},
            {"role": "user", "content": f"Classify this Spanish learner message: {first_message}"},
        ],
    )
    return response.choices[0].message.content.strip()

Step 5: Wrap it in a session class

The class ties everything together. I initialize the client once and reuse it across turns to keep TCP connections warm. I also truncate history to the last twenty messages. Even though Oxlo.ai does not charge per token, the model itself still has a context window, and we want to avoid silent truncation by the API. Twenty turns is enough for a twenty-minute practice session. If you need longer memory, implement a separate summarization thread that condenses old turns into a static background paragraph. The start_session method is separate from reply only because it triggers the level detector. After that, both methods use the same underlying create call. If you want to support multiple languages, parameterize the SYSTEM_PROMPT string and load a different template based on the target_language argument.

class LanguageTutor:
    def __init__(self, api_key, target_language="Spanish"):
        self.client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key=api_key)
        self.model = "qwen-3-32b"
        self.language = target_language
        self.history = []
        self.level = None
        
    def start_session(self, first_message):
        self.level = detect_level(self.client, first_message)
        level_note = f"[The user is {self.level}. Adjust your complexity accordingly.]"
        self.history.append({"role": "user", "content": first_message})
        
        response = self.client.chat.completions.create(
            model=self.model,
            messages=[
                {"role": "system", "content": SYSTEM_PROMPT + " " + level_note},
                *self.history
            ],
        )
        reply = response.choices[0].message.content
        self.history.append({"role": "assistant", "content": reply})
        return reply
        
    def reply(self, user_message):
        self.history.append({"role": "user", "content": user_message})
        recent = self.history[-20:]
        
        response = self.client.chat.completions.create(
            model=self.model,
            messages=[
                {"role": "system", "content": SYSTEM_PROMPT},
                *recent
            ],
        )
        reply = response.choices[0].message.content
        self.history.append({"role": "assistant", "content": reply})
        return reply

Run it

Save the complete script as tutor.py, export your key, and run it. The first call classifies the level. The second call continues the conversation using the stored history. Watch for the bracketed corrections and the closing question. You should see the tutor stay in Spanish, fix the word order in your first sentence, and answer the vocabulary question without switching to English for the explanation. If the model is too strict or too lenient, edit the system prompt and rerun. That is the entire feedback loop.

import os

tutor = LanguageTutor(api_key=os.environ["OXLO_API_KEY"])

first = "Hola, me llamo Juan y quiero practicar español porque mi trabajo necesito hablar con clientes."
print(f"User: {first}")
print(f"Tutor: {tutor.start_session(first)}")
print()

second = "Ah, gracias. Yo no sabía eso. ¿Cómo se dice 'deadline' en español?"
print(f"User: {second}")
print(f"Tutor: {tutor.reply(second)}")

Example output:

User: Hola, me llamo Juan y quiero practicar español porque mi trabajo necesito hablar con clientes.
Tutor: ¡Hola, Juan! Mucho gusto.
[Necesito hablar con clientes en mi trabajo.]
¿En qué sector trabajas y cuántos años llevas usando el español en tu puesto?

User: Ah, gracias. Yo no sabía eso. ¿Cómo se dice 'deadline' en español?
Tutor: De nada.
[No sabía eso.]
Se dice "fecha límite" o "plazo". En muchos países también usamos la palabra inglesa "deadline" en contextos informales.
¿Tienes una fecha límite importante esta semana?

Next steps

To turn this into a real product, connect the reply method to a WebSocket endpoint in FastAPI and add a speech layer. Pipe Oxlo.ai's Whisper Large v3 transcriptions into the user_message field, and feed the assistant replies to a TTS service such as Kokoro 82M. This turns the text tutor into a spoken interview coach.

You can also A/B test different system prompts by routing a fraction of traffic to a variant that uses deepseek-v3.2 for more technical explanations, keeping qwen-3-32b for beginner-friendly flow. Because Oxlo.ai's pricing is per request, the cost of running two models in parallel is predictable and easy to budget.

Ready to build with Oxlo.ai?

Get started building high-performance AI inference applications today.

Get started
Ox Assistant
Online
OxBot
OxBot

Hi there! Try our cost calculator to see what you'd save with Oxlo.ai.