Building Chemistry Tools with LLMs: A Step-by-Step Guide

We are building a command-line chemistry assistant that ingests SMILES strings, validates them with RDKit, computes molecular weight and LogP, and returns a concise markdown summary. This tool is aimed at medicinal chemists and cheminformatics developers who need rapid compound analysis without leaving their terminal or paying for heavy desktop licenses.

What you'll need

Before we start, grab an API key from https://portal.oxlo.ai. Oxlo.ai offers a free tier with 60 requests per day, which is plenty for testing this agent. You will also need Python 3.10 or newer, the OpenAI SDK, and RDKit. RDKit is the standard open-source cheminformatics toolkit, and you can install the Python dependencies with pip:

pip install openai rdkit

If you are on an M-series Mac and the pip install fails, use conda install -c conda-forge rdkit instead. I recommend exporting your Oxlo.ai key as an environment variable so it never touches disk in a script.

export OXLO_API_KEY="..."

Step 1: Bootstrap the Oxlo.ai client

Oxlo.ai exposes a fully OpenAI-compatible API, so we can use the official SDK and only change the base URL and model name. I am using llama-3.3-70b here because it handles tool calling reliably for structured chemistry tasks. If you later need deeper reasoning for reaction mechanisms, you can swap in deepseek-v3.2 or qwen-3-32b without changing any other code.

One practical note: SMILES strings for large molecules can get long, and agentic loops add context quickly. Oxlo.ai uses flat per-request pricing, so those long inputs and extra tool-turn messages do not inflate your bill the way token-based metering would. That makes iterative chemistry exploration far more predictable.

import os
from openai import OpenAI

client = OpenAI(
    base_url="https://api.oxlo.ai/v1",
    api_key=os.environ.get("OXLO_API_KEY")
)

MODEL = "llama-3.3-70b"

Step 2: Build the cheminformatics tools

Next we write three small Python functions that wrap RDKit. These do the actual numerical work so the LLM does not hallucinate molecular properties. Molecular weight drives dosing calculations, and LogP predicts how easily a compound crosses biological membranes. Each function accepts a SMILES string and returns a plain dictionary that the model can read.

from rdkit import Chem
from rdkit.Chem import Descriptors, Crippen

def validate_smiles(smiles: str):
    mol = Chem.MolFromSmiles(smiles)
    if mol is None:
        return {"valid": False, "error": "RDKit could not parse SMILES"}
    return {"valid": True, "canonical": Chem.MolToSmiles(mol)}

def get_molecular_weight(smiles: str):
    mol = Chem.MolFromSmiles(smiles)
    if not mol:
        return {"error": "Invalid SMILES"}
    return {"mw": round(Descriptors.MolWt(mol), 3)}

def get_logp(smiles: str):
    mol = Chem.MolFromSmiles(smiles)
    if not mol:
        return {"error": "Invalid SMILES"}
    return {"logp": round(Crippen.MolLogP(mol), 3)}

Step 3: Register tool schemas

The LLM needs JSON schemas to know when to invoke our functions. We define one schema per tool and pass them in the tools parameter of the chat completion call. Keep descriptions explicit; the model uses them to decide what to call. I name the functions with get_ and validate_ prefixes so the intent is obvious from the schema alone.

tools = [
    {
        "type": "function",
        "function": {
            "name": "validate_smiles",
            "description": "Check if a SMILES string is chemically valid and return the canonical form.",
            "parameters": {
                "type": "object",
                "properties": {
                    "smiles": {"type": "string", "description": "The SMILES string to validate"}
                },
                "required": ["smiles"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "get_molecular_weight",
            "description": "Calculate the molecular weight of a valid SMILES string.",
            "parameters": {
                "type": "object",
                "properties": {
                    "smiles": {"type": "string", "description": "The SMILES string"}
                },
                "required": ["smiles"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "get_logp",
            "description": "Calculate the partition coefficient (LogP) of a valid SMILES string.",
            "parameters": {
                "type": "object",
                "properties": {
                    "smiles": {"type": "string", "description": "The SMILES string"}
                },
                "required": ["smiles"]
            }
        }
    }
]

Step 4: Write the system prompt

The system prompt is the agent's instruction manual. I keep it strict: validate first, then compute properties, then summarize. This prevents the model from skipping steps or guessing values when RDKit can compute them exactly. I also cap the explanation length so the output stays scannable on a small terminal.

SYSTEM_PROMPT = """You are a computational chemistry assistant. Your job is to help chemists analyze molecules provided as SMILES strings.

When a user gives you a SMILES string, follow this exact workflow:
1. Call validate_smiles to ensure the input is chemically valid.
2. If valid, call get_molecular_weight and get_logp in parallel.
3. Summarize the results in a concise markdown table with columns: Property, Value.
4. If the SMILES is invalid, explain the likely issue and ask for a corrected string.

Do not guess values. Only use the tool outputs. Keep explanations under three sentences."""

Step 5: Wire the tool execution loop

Now we connect the client to the tools. The loop sends the user message to Oxlo.ai, checks for tool_calls in the response, executes the matching Python function, and feeds the result back as a tool message. We append each assistant message to the conversation history so subsequent turns retain context.

The loop exits when the model returns a final answer instead of more tool calls. I serialize results with json.dumps so numbers and booleans stay typed correctly in the message payload.

import json

def run_chemistry_agent(user_message: str):
    messages = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": user_message},
    ]

    while True:
        response = client.chat.completions.create(
            model=MODEL,
            messages=messages,
            tools=tools,
            tool_choice="auto",
        )

        message = response.choices[0].message
        assistant_msg = {
            "role": message.role,
            "content": message.content or "",
        }
        if message.tool_calls:
            assistant_msg["tool_calls"] = [
                tc.model_dump() for tc in message.tool_calls
            ]
        messages.append(assistant_msg)

        if not message.tool_calls:
            return message.content

        for tc in message.tool_calls:
            fn_name = tc.function.name
            args = json.loads(tc.function.arguments)

            if fn_name == "validate_smiles":
                result = validate_smiles(**args)
            elif fn_name == "get_molecular_weight":
                result = get_molecular_weight(**args)
            elif fn_name == "get_logp":
                result = get_logp(**args)
            else:
                result = {"error": "Unknown function"}

            messages.append({
                "role": "tool",
                "tool_call_id": tc.id,
                "name": fn_name,
                "content": json.dumps(result),
            })

Run it

Save everything in a file named chemistry_agent.py and run it. I will analyze aspirin and a deliberately broken SMILES string to show both success and graceful error handling.

if __name__ == "__main__":
    queries = [
        "Analyze CC(=O)Oc1ccccc1C(=O)O",
        "Analyze C1CCOC1X"
    ]
    for q in queries:
        print(f"User: {q}")
        print(run_chemistry_agent(q))
        print("-" * 40)

Example output for the aspirin query:

User: Analyze CC(=O)Oc1ccccc1C(=O)O

| Property | Value |
|----------|-------|
| Valid | True |
| Canonical | CC(=O)Oc1ccccc1C(=O)O |
| Molecular Weight | 180.159 |
| LogP | 1.24 |

Aspirin has a moderate molecular weight and favorable lipophilicity for oral absorption.

For the invalid input, the agent reports the validation failure and asks for a corrected string rather than crashing.

Next steps

From here, extend the tool set to compute tPSA and Lipinski Rule of Five violations by adding new RDKit wrappers and schemas. You can also add a second agent step that suggests retrosynthetic routes, or connect to the PubChem PUG-REST API to pull known assay data.

If you want to share this with a lab, wrap the function in a small Gradio app so researchers can paste SMILES strings in a browser. For workloads involving very long polymer SMILES or multi-step agentic planning, consider switching to qwen-3-32b or kimi-k2.6 on Oxlo.ai. Both handle extended context and tool chaining well, and the flat per-request pricing keeps batch processing costs predictable regardless of input length. You can compare plans at https://oxlo.ai/pricing.

Building Chemistry Tools with LLMs: A Step-by-Step Guide

What you'll need

Step 1: Bootstrap the Oxlo.ai client

Step 2: Build the cheminformatics tools

Step 3: Register tool schemas

Step 4: Write the system prompt

Step 5: Wire the tool execution loop

Run it

Next steps

Related articles

Building Environmental Science Tools with LLMs: A Tutorial

LLMs in Environmental Science: Applications and Opportunities

Using LLMs in Biology: A Guide

The Role of LLMs in Biology: Current Trends and Future Directions

Applying LLMs in Chemistry: Opportunities and Challenges

Applying LLM to Physics Research

Ready to build with Oxlo.ai?