Guaranteed 15% off your current AI inference bill for team spending up to $20000 / month.

Book a call →
Back to Blogs
Product

Integrating Oxlo.ai with Other AI Tools and Services: A Step-by-Step Guide

Most production AI stacks are not monoliths. They are pipelines that route prompts across models, vector stores, automation platforms, and monitoring layers...

Integrating Oxlo.ai with Other AI Tools and Services: A Step-by-Step Guide

Most production AI stacks are not monoliths. They are pipelines that route prompts across models, vector stores, automation platforms, and monitoring layers. The friction in building these pipelines usually comes from incompatible APIs, hidden token costs, and cold starts that break synchronous workflows. Oxlo.ai removes that friction by offering a fully OpenAI-compatible inference platform with request-based pricing, a broad model catalog, and zero cold starts on popular models. Because the Oxlo.ai API follows the same schema as OpenAI, you can integrate it into existing toolchains by changing a single line of configuration: the base URL.

Drop-in Replacement with the OpenAI SDK

The fastest way to integrate Oxlo.ai is through the official OpenAI SDKs. Oxlo.ai exposes the standard chat/completions, embeddings, images/generations, audio/transcriptions, and audio/speech endpoints at https://api.oxlo.ai/v1. This means any Python, Node.js, or cURL script written for OpenAI works without structural changes.

Here is a minimal Python example that routes requests to Llama 3.3 70B:

from openai import OpenAI

client = OpenAI(
    base_url="https://api.oxlo.ai/v1",
    api_key="YOUR_OXLO_API_KEY"
)

response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[{"role": "user", "content": "Explain request-based pricing."}],
    stream=False
)
print(response.choices[0].message.content)

In Node.js, the pattern is identical:

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://api.oxlo.ai/v1',
  apiKey: process.env.OXLO_API_KEY,
});

const stream = await client.chat.completions.create({
  model: 'deepseek-r1-671b',
  messages: [{ role: 'user', content: 'Write a React hook.' }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || '');
}

Because Oxlo.ai supports streaming, JSON mode, function calling, vision, and multi-turn conversations out of the box, you do not lose functionality when you switch the base URL.

Connecting Orchestration Frameworks

Frameworks like LangChain and LlamaIndex assume an OpenAI-style interface. You can point them to Oxlo.ai by overriding the base URL and API key in the constructor.

With LangChain in Python:

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="qwen3-32b",
    base_url="https://api.oxlo.ai/v1",
    api_key="YOUR_OXLO_API_KEY",
    temperature=0.2
)

llm.invoke("Draft an email to the engineering team.")

For LlamaIndex, use the OpenAI-compatible LLM class:

from llama_index.llms.openai import OpenAI

llm = OpenAI(
    model="kimi-k2-6",
    api_key="YOUR_OXLO_API_KEY",
    api_base="https://api.oxlo.ai/v1"
)

response = llm.complete("Summarize the following legal text...")

This compatibility extends to retrieval-augmented generation chains, agents, and evaluation pipelines. You can mix Oxlo.ai models for different subtasks, such as routing simple queries to DeepSeek V4 Flash and complex reasoning to DeepSeek R1 671B MoE, all within the same framework.

Building Agentic Workflows with Function Calling

Modern agents rely on tool use loops. Each loop can inflate prompt length as prior tool outputs are fed back into context. On token-based providers, this makes agentic workloads expensive and unpredictable. Oxlo.ai uses request-based pricing: one flat cost per API request regardless of prompt length. That makes long-context agent loops significantly cheaper than token-based alternatives such as Together AI, Fireworks AI, OpenRouter, Replicate, or Anyscale.

Oxlo.ai supports function calling across its chat models. Below is a Python pattern that registers a weather tool:

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string"}
                },
                "required": ["location"]
            }
        }
    }
]

response = client.chat.completions.create(
    model="glm-5",
    messages=[{"role":

Ready to build with Oxlo.ai?

Get started building high-performance AI inference applications today.

Get started
Ox Assistant
Online
OxBot
OxBot

Hi there! Try our cost calculator to see what you'd save with Oxlo.ai.