
I built a lightweight interactive fiction engine that generates branching story scenes, character dialogue, and player choices from a single prompt. It runs entirely on Oxlo.ai's request-based API, so a long worldbuilding preamble does not inflate the cost. If you are building narrative tools, visual novels, or RPG backends, this pattern drops straight into your stack.
What you'll need
- Python 3.10 or higher
- An Oxlo.ai API key from https://portal.oxlo.ai
- The OpenAI SDK:
pip install openai
I also recommend creating a virtual environment so the openai dependency does not collide with other projects. The code below uses only the standard library plus the SDK, so no extra data science packages are required.
Step 1: Test the connection with a raw premise
Before I lock down schemas or state management, I verify that the endpoint is alive and that the model writes in the tone I want. I point the OpenAI SDK at Oxlo.ai and send a short creative prompt to Llama 3.3 70B. Because Oxlo.ai serves popular models with no cold starts, the first response lands immediately. That rapid feedback loop matters when you are iterating on prompts, because nothing kills creative momentum like a thirty-second cold boot on every syntax tweak.
from openai import OpenAI
client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")
response = client.chat.completions.create(
model="llama-3.3-70b",
messages=[
{"role": "system", "content": "You are a terse sci-fi writer."},
{"role": "user", "content": "Write a one-paragraph opening scene for a heist in a rain-soaked neo-Tokyo."},
],
)
print(response.choices[0].message.content)
If the prose feels too dry or too purple, I adjust the system prompt and rerun. This smoke test costs one request on Oxlo.ai, regardless of how long my world bible turns out to be later.
Step 2: Define the narrative system prompt
Chat completions are stateless, so the system prompt has to carry the entire ruleset. I treat it like a contract: genre constraints, output schema, and tone. I store it in a constant so I can iterate without hunting through function bodies. Keeping the JSON schema inside the system prompt rather than the user message reduces the chance that the model confuses instructions with story content. For entertainment applications, consistency is more important than surprise, so I leave temperature at the default. If I were building a surreal comedy engine, I might raise it.
SYSTEM_PROMPT = """
You are NarrativeEngine, a deterministic story generator for an interactive fiction game.
Rules:
- Write in second person, present tense.
- Each response must be a single JSON object.
- The JSON must contain:
- scene_text: a vivid scene description of 2 to 4 sentences.
- choices: an array of exactly 3 player choices. Each choice has id (1, 2, or 3) and text (max 10 words).
- mood: one of tense, hopeful, or ominous.
- Do not break character or explain the rules.
"""
I keep the schema flat. Nested objects are fine for complex RPG stat blocks, but a flat structure is easier to validate with Pydantic or plain JSON Schema when you wire this up to a frontend.
Step 3: Generate structured story beats with JSON mode
Game clients need structured data, not prose that requires regex hacks. Oxlo.ai supports the same response_format flag as the OpenAI API, so I force valid JSON directly at the API level. I still ask for the schema explicitly in the user prompt to remove ambiguity, but the API constraint is what prevents markdown fences or trailing narration. After the call, I parse with the standard json module. In a production service I would swap this for Pydantic so I get early validation errors instead of half-rendered scenes.
import json
user_prompt = (
"Premise: The player is a data-thief on a rooftop in neo-Tokyo. "
"They just spotted a corporate VTOL descending toward their position. "
"Generate the opening scene as valid JSON matching the schema."
)
response = client.chat.completions.create(
model="llama-3.3-70b",
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": user_prompt},
],
response_format={"type": "json_object"},
)
scene = json.loads(response.choices[0].message.content)
print(json.dumps(scene, indent=2))
Notice that the user prompt carries the story premise while the system prompt carries the formatting rules. Separating concerns this way makes it easy to reuse the same engine for fantasy, horror, or romance without rewriting the JSON contract.
Step 4: Carry state across turns with a message buffer
To keep characters consistent and plot threads alive, I append each assistant response and the player's choice to the messages list. I wrap this in a small class so the state is portable between a CLI, a FastAPI handler, or a WebSocket worker. Because Oxlo.ai charges per request rather than per token, I do not need to aggressively truncate context to save money. That makes long narrative arcs predictable to budget. If you expect truly epic campaigns, you could switch the model to kimi-k2.6 for its 131K context window, or deepseek-v3.2 for strong coding and reasoning if your game includes procedural puzzles.
class StorySession:
def __init__(self, client, model="llama-3.3-70b"):
self.client = client
self.model = model
self.messages = [{"role": "system", "content": SYSTEM_PROMPT}]
def act(self, user_text):
self.messages.append({"role": "user", "content": user_text})
resp = self.client.chat.completions.create(
model=self.model,
messages=self.messages,
response_format={"type": "json_object"},
)
content = resp.choices[0].message.content
self.messages.append({"role": "assistant", "content": content})
return json.loads(content)
The class exposes a single act method. That narrow interface keeps the door open to swapping the backend later, whether you move from Oxlo.ai to a fine-tuned local model or add a caching layer in front of the client.
Step 5: Run the interactive loop
I wire the session into a small CLI loop. It prints the scene, lists choices, and waits for input. I feed the player's decision back as a simple sentence so the model knows which branch to advance. The input validation is minimal here, but in a production server you would sanitize with something like pydantic.validate_call or at least length limits to prevent prompt injection via raw user text. If you want to support multilingual stories or more complex agentic tool use, you can swap the model string to qwen-3-32b or kimi-k2.6 without touching any other code.
def run_game(session, premise):
scene = session.act(premise)
while True:
print("\n" + scene["scene_text"])
print(f"Mood: {scene['mood']}")
for c in scene["choices"]:
print(f" {c['id']}. {c['text']}")
raw = input("\nChoose 1 to 3 (or q to quit): ").strip()
if raw.lower() == "q":
break
try:
choice_num = int(raw)
chosen = next(c for c in scene["choices"] if c["id"] == choice_num)
except (ValueError, StopIteration):
print("Invalid choice.")
continue
player_move = f"The player chooses option {choice_num}: {chosen['text']}"
scene = session.act(player_move)
session = StorySession(client)
premise = (
"Premise: The player is a data-thief on a rooftop in neo-Tokyo. "
"They just spotted a corporate VTOL descending toward their position."
)
run_game(session, premise)
The loop blocks on standard input, which is fine for a prototype. If you are shipping this to players, replace input with an HTTP POST or a WebSocket message handler. The core logic stays identical.
Run it
Executing the script starts the loop. Here is the kind of structured scene Llama 3.3 70B returns on Oxlo.ai after the player selects the first option. The JSON is clean enough to feed directly into a Unity UI, a React frontend, or even a Twine exporter.
{
"scene_text": "You flatten yourself against a ventilation duct as the VTOL's searchlight sweeps the rooftop. Rain hisses against carbon-fiber rotors. Through the grate beneath your boots, you see the data-core vault glowing amber three floors down.",
"choices": [
{"id": 1, "text": "Rappel down the maintenance shaft"},
{"id": 2, "text": "Hack the VTOL's camera feed"},
{"id": 3, "text": "Trigger the fire alarm as a diversion"}
],
"mood": "tense"
}
If the player keeps going, the model remembers previous choices because the full message buffer travels with every request. That continuity is what separates a scripted choose-your-own-adventure from a real generative narrative engine. After three or four turns the context is substantial, yet the cost remains one flat request per move.
Wrap-up
This engine is less than sixty lines of Python, but it is production-grade enough to slot behind a FastAPI endpoint or a Unity WebGL client. Two concrete next steps: add an inventory field to the JSON schema and feed previous inventory state into each user prompt so the model respects item constraints, or cache completed scenes in Redis so players can resume mid-arc without replaying from the opening. You could also pair the text engine with Oxlo.ai's image generation endpoints, such as Flux.1 or Oxlo.ai Image Pro, to auto-generate scene backgrounds from the scene_text summary. If you want predictable costs while your players write novels at your model, the flat per-request pricing on Oxlo.ai keeps your burn rate stable no matter how elaborate the prompts become. See https://oxlo.ai/pricing for plan details.

