
I built a small CLI tool that turns a text prompt like "a sad piano melody in C minor, 90 bpm" into a playable MIDI file. An LLM running on Oxlo.ai composes the piece in ABC notation, and a short Python script converts that notation into MIDI. It is useful for developers who want to prototype background music for games, apps, or interactive fiction without opening a digital audio workstation.
What you'll need
- Python 3.10 or newer
- An Oxlo.ai API key from https://portal.oxlo.ai
- The Python packages:
pip install openai music21 - A virtual environment (recommended)
Oxlo.ai is fully OpenAI SDK compatible, so the client setup is a single line change to the base URL. Oxlo.ai hosts more than 45 models across seven categories. For music generation I recommend starting with Llama 3.3 70B because it handles structured output reliably, but Qwen 3 32B is a strong alternative if you want to describe your melody in a language other than English.
Step 1: Initialize the Oxlo.ai client
First, I create the client that points to Oxlo.ai. I keep the API key out of source control by reading it from the environment, falling back to a placeholder string so the script does not crash during a quick copy-paste test. Unlike token-based providers, Oxlo.ai does not charge more for longer system prompts or verbose completions. That matters when you are experimenting with large prompts that carry music theory context.
import os
from openai import OpenAI
client = OpenAI(
base_url="https://api.oxlo.ai/v1",
api_key=os.environ.get("OXLO_API_KEY", "YOUR_OXLO_API_KEY"),
)
Step 2: Craft the system prompt
LLMs are chatty by default. If I ask for a melody, I might get a paragraph of music theory before the notes. To fix this, I use a strict system prompt that tells the model to return only valid ABC notation. ABC is a text format that music21 can parse directly, which keeps the pipeline free of heavy native dependencies. Alternatives like MusicXML are verbose and harder for a model to write correctly without drifting into XML errors.
I also constrain the output length to 8-16 bars. This keeps generation fast and makes the model less likely to hallucinate invalid syntax.
SYSTEM_PROMPT = """You are an expert composer. The user will describe a piece of music.
Write it as valid ABC notation.
Rules:
- Start with a header: X:1, T:<title>, M:<time>, L:<note length>, K:<key>, Q:<tempo>.
- Then provide the melody notes.
- Output ONLY the ABC notation, with no markdown fences or explanation.
- Keep it to 8 to 16 bars.
"""
Step 3: Generate ABC notation
Now I wire the user prompt to Oxlo.ai. I use llama-3.3-70b and set temperature to 0.7. Lower values make the model repeat safe scales, while higher values produce stranger intervals. 0.7 is a good balance for prototyping. I set max_tokens to 1024. Most 8-16 bar melodies fit comfortably inside that limit. If you plan to generate multi-track scores, you can raise the limit.
Because Oxlo.ai uses request-based pricing, sending a long prompt or receiving a lengthy score does not change the cost of the call. That matters when you are iterating on harmonies and feeding prior results back into context. With token-based providers, long ABC strings inflate billing quickly.
def generate_abc(user_prompt: str) -> str:
response = client.chat.completions.create(
model="llama-3.3-70b",
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": user_prompt},
],
temperature=0.7,
max_tokens=1024,
)
abc = response.choices[0].message.content.strip()
# Defensive cleanup: strip markdown fences if the model added them
if abc.startswith("```"):
lines = abc.splitlines()
if lines[0].startswith("```"):
lines = lines[1:]
if lines and lines[-1].startswith("```"):
lines = lines[:-1]
abc = "\n".join(lines).strip()
return abc
Step 4: Convert ABC to MIDI
Raw ABC notation is not audio yet. I write it to a temporary file and let music21 parse it. Music21 handles the translation to MIDI automatically. I delete the temporary file after writing the final .mid so the workspace stays clean. Music21 is not the only option, but it is the most reliable for parsing text-based notation without calling external binaries. If you prefer, you could replace this step with a manual note-to-frequency synthesizer using numpy, but that requires handling accidentals, note lengths, and ties yourself. For a shipped tool, music21 saves hours of edge-case handling.
If the ABC header includes a tempo marker like Q:120, music21 respects it during conversion. I do not need to install a full notation suite. This step is where the LLM's creative output turns into real bytes a synthesizer can play.
import tempfile
import os
from music21 import converter
def abc_to_midi(abc_string: str, output_path: str):
with tempfile.NamedTemporaryFile(mode="w", suffix=".abc", delete=False) as f:
f.write(abc_string)
tmp_path = f.name
try:
score = converter.parse(tmp_path)
score.write("midi", fp=output_path)
print(f"Wrote MIDI to {output_path}")
finally:
os.remove(tmp_path)
Step 5: Build the CLI
To make the tool usable outside of a notebook, I wrap everything in a small argparse script. The user passes a prompt and an output path. The script prints a preview of the ABC notation so you can verify the structure before opening the MIDI in a DAW. I keep the CLI minimal, but in production you might wrap the Oxlo.ai call in a retry loop. The API has no cold starts on popular models, so the first request after idle time returns as quickly as the tenth.
import argparse
def main():
parser = argparse.ArgumentParser(
description="Generate MIDI from a text prompt via Oxlo.ai"
)
parser.add_argument("--prompt", required=True, help="Description of the music you want")
parser.add_argument("--out", default="output.mid", help="Output MIDI file path")
args = parser.parse_args()
print("Composing via Oxlo.ai...")
abc = generate_abc(args.prompt)
print("Generated ABC notation:\n")
print(abc[:500] + ("..." if len(abc) > 500 else ""))
print("\nConverting to MIDI...")
abc_to_midi(abc, args.out)
if __name__ == "__main__":
main()
Run it
Save all the code in a file named music_tool.py, then run:
python music_tool.py --prompt "upbeat chiptune in D major, 140 bpm" --out chiptune.mid
When I ran this against Oxlo.ai, the model returned an ABC block that started like this:
X:1
T:Chiptune Jump
M:4/4
L:1/8
K:D
Q:140
|: DFAd d2 cd | B2 GB A2 F2 | GABc d2 d2 | cdec d4 :|
The terminal output looked like:
Composing via Oxlo.ai...
Generated ABC notation:
X:1
T:Chiptune Jump
M:4/4
L:1/8
K:D
Q:140
|: DFAd d2 cd | B2 GB A2 F2 |...
Converting to MIDI...
Wrote MIDI to chiptune.mid
I opened chiptune.mid in a DAW and the tempo and key were correct. After generation, inspect the ABC header first. A missing K: field will confuse the parser. If the model omits it, add it manually or tighten the system prompt. The MIDI file is standard Type 1, so it imports cleanly into GarageBand, Reaper, or any web-based sequencer.
Wrap-up and next steps
This pipeline is easy to extend. One concrete next step is adding a second LLM pass that generates a bass line or harmony track based on the first melody. You send the generated ABC back to Oxlo.ai with a new system prompt, and because Oxlo.ai uses flat per-request pricing, feeding long context like a full score does not inflate the cost. See the pricing page for plan details.
Another step is swapping llama-3.3-70b for qwen-3-32b or deepseek-v3.2 to test different compositional styles. Both models are available on Oxlo.ai with the same OpenAI-compatible client, so the change is one string.


