# Oxlo.ai

> Oxlo.ai is a developer-first AI inference platform. Run frontier models like Kimi K2.6 and DeepSeek V4 Flash on an OpenAI-compatible API, with request-based pricing that is predictable and cheaper for long-context workloads than token-based providers. Oxlo.ai never sells your data and never trains on your prompts.

## Core Value Proposition

Three pillars. Frontier performance: models like Kimi K2.6 match or beat the top labs on public benchmarks. Lower cost: request-based pricing means one flat cost per API call regardless of token count. Total privacy: we never sell your data and never train on your prompts. OpenAI SDK compatible, one line of code to switch.

## Benchmarks (Kimi K2.6 vs the frontier labs)

Kimi K2.6 is available on Oxlo.ai and competes directly with GPT-5.4, Claude Opus 4.6, and Gemini 3.1 Pro. Selected results where Kimi K2.6 leads the reported field (percent, higher is better):

- DeepSearchQA (f1-score): Kimi K2.6 92.5, Claude Opus 4.6 91.3, Kimi K2.5 89.0, Gemini 3.1 Pro 81.9, GPT-5.4 78.6
- DeepSearchQA (accuracy): Kimi K2.6 83.0, Claude Opus 4.6 80.6, Kimi K2.5 77.1, GPT-5.4 63.7, Gemini 3.1 Pro 60.2
- HLE-Full with tools: Kimi K2.6 54.0, Claude Opus 4.6 53.0, GPT-5.4 52.1, Gemini 3.1 Pro 51.4, Kimi K2.5 50.2
- SWE-Bench Pro: Kimi K2.6 58.6, GPT-5.4 57.7, Gemini 3.1 Pro 54.2, Claude Opus 4.6 53.4, Kimi K2.5 50.7

Source: Moonshot AI, https://www.kimi.com/blog/kimi-k2-6

## Docs

- [Getting Started](https://docs.oxlo.ai/docs/quickstart): Set up your first API call in under 2 minutes
- [API Reference](https://docs.oxlo.ai/docs/api/parameters): Full endpoint and parameter docs
- [Text Generation](https://docs.oxlo.ai/docs/capabilities/text-generation): Chat completions (OpenAI-compatible)
- [Vision Models](https://docs.oxlo.ai/docs/capabilities/vision-models): Image understanding with Gemma 3 and Kimi VL
- [Image Generation](https://docs.oxlo.ai/docs/capabilities/image-generation): Generate images with SDXL, Flux, and Oxlo Image Pro
- [Embeddings](https://docs.oxlo.ai/docs/capabilities/embeddings): BGE-Large and E5-Large embedding models
- [Speech to Text](https://docs.oxlo.ai/docs/capabilities/speech-to-text): Whisper-based audio transcription
- [Text to Speech](https://docs.oxlo.ai/docs/capabilities/text-to-speech): Kokoro 82M TTS
- [Object Detection](https://docs.oxlo.ai/docs/capabilities/object-detection): YOLOv9 and YOLOv11
- [Pricing](https://oxlo.ai/pricing): Full pricing table
- [Models](https://oxlo.ai/models): Complete model registry with live status

## Available Models

### Large Language Models (Chat/Reasoning)
- Qwen 3 32B: State-of-the-art multilingual reasoning, agent tasks, and code generation (Premium)
- Llama 3.3 70B: Meta's flagship 70B parameter general-purpose LLM (Premium)
- DeepSeek R1 671B: Deep reasoning and complex coding tasks - full 671B MoE model (Premium)
- DeepSeek R1 0528: Latest DeepSeek R1 iteration with improved reasoning (Premium)
- GPT-Oss 120B: Large-scale open-source GPT model (Premium)
- Kimi K2 Thinking: Advanced reasoning with chain-of-thought (Premium)
- Kimi K2.5: Latest Kimi reasoning model (Premium)
- Kimi K2.6: Latest multimodal reasoning model with video input and 131K context (Premium)
- Falcon 11B: Enhanced reasoning and text generation (Pro)
- Falcon 7B: Efficient reliable text generation (Free)
- DeepSeek V3: Fast general-purpose inference (Free)
- DeepSeek V3.2: Improved coding and reasoning (Free)
- Mistral 7B v0.3: Fast and efficient for lightweight tasks (Free)
- Llama 3.2 3B: Compact but capable (Free)
- Gemma 3 4B: Google's efficient small model with vision support (Free)
- Qwen 2.5 7B: Strong multilingual 7B model (Pro)
- Llama 3.1 8B: Versatile 8B model (Pro)
- Mistral Small 24B: Mid-range for balanced performance (Pro)
- Qwen 3 14B: Mid-size Qwen with great reasoning (Pro)
- Llama 4 Maverick 17B: Meta's latest architecture (Pro)
- DeepSeek Coder 33B: Specialised coding model (Pro)
- Ministral 3 14B: Efficient mid-range model (Pro)
- Minimax M2.5: MoE model for coding, agentic tool use, and complex workflows (Premium)
- GLM 5: 744B MoE model for systems engineering and long-horizon agentic tasks (Premium)

### Vision Models
- Gemma 3 27B: Google's 27B vision-language model (Premium)
- Gemma 3 4B: Compact vision-language model (Free)
- Kimi VL A3B: Compact multimodal vision model (Pro)

### Code Models
- Qwen 3 Coder 30B: Specialised coding model with 30B parameters (Premium)
- DeepSeek Coder: Code generation and understanding (Pro)
- Qwen 2.5 Coder 7B: Accurate coding assistance and debugging (Pro)
- Oxlo Coder Fast: High-speed code generation and completion (Pro)

### Image Generation
- Oxlo Image Pro: Premium Flux 2-based image generation (Premium)
- Oxlo Image Ultra: Highest-quality image generation (Premium)
- Stable Diffusion 3.5 Large: High-quality open-source image gen (Premium)
- SDXL Lightning: Fast image generation (Pro)
- Stable Diffusion 1.5: Lightweight image generation (Free)
- Flux.1 Schnell: Fast Flux-based generation (Pro)

### Audio / Speech
- Whisper Large v3: OpenAI's best transcription model (Free)
- Whisper Turbo: Fastest transcription (Free)
- Whisper Medium: Mid-range transcription (Free)
- Kokoro 82M: Natural-sounding text-to-speech (Free)

### Embeddings
- BGE-Large: BAAI's top-performing text embedding model (Free)
- E5-Large: Microsoft's multilingual embedding model (Free)

### Object Detection
- YOLOv9: State-of-the-art real-time object detection (Free)
- YOLOv11: Latest YOLO architecture (Free)

## Pricing

Request-based pricing. No token counting. No variable billing. One price per request regardless of prompt length.

| Plan | Price | Requests/Day | Max Output Tokens | Concurrency |
|------|-------|--------------|-------------------|-------------|
| Free | $0/mo | 60 | 4,096 | 1 |
| Pro | $80/mo | 1,000 | 8,192 | 20 |
| Premium | $350/mo | 5,000 | 32,768 | 50 |
| Enterprise | Custom | Unlimited | Custom | Custom |

All plans include a 7-day free trial with full access to every model.

### Enterprise Pricing Guarantee
Custom pricing is available for Enterprise users. We guarantee to provide a 30% discount off your current provider's API pricing for equivalent models.


## Privacy and Data Handling

Oxlo.ai never sells your data and never uses your prompts or outputs to train models. Your inputs stay yours. Inference requests are processed only to return your response.

## Key Differentiators

- **Frontier performance**: models like Kimi K2.6 match or beat GPT-5.4, Claude Opus 4.6, and Gemini 3.1 Pro on many benchmarks.
- **Privacy by default**: we never sell your data and never train on your prompts.
- **Request-based pricing**: Pay per API call, not per token. A 100-token prompt and a 10,000-token prompt cost the same.
- **No cold starts**: All popular models stay loaded in GPU memory for instant inference.
- **OpenAI SDK drop-in replacement**: Change one line of code to switch from OpenAI, Together AI, or any compatible provider.
- **45+ models across 7 categories**: LLMs, vision, code, image gen, audio, embeddings, and detection.
- **7-day free trial**: Full access to every model, no credit card required.

## API Details

- Base URL: `https://api.oxlo.ai/v1`
- Compatibility: Fully OpenAI SDK compatible (Python, Node.js, cURL)
- Authentication: Bearer token via API key
- Endpoints: `/chat/completions`, `/embeddings`, `/images/generations`, `/audio/transcriptions`, `/audio/speech`

## Integration Example (Python)

```python
import openai

client = openai.OpenAI(
    base_url="https://api.oxlo.ai/v1",
    api_key="YOUR_API_KEY"
)

response = client.chat.completions.create(
    model="qwen-3-32b",
    messages=[{"role": "user", "content": "Hello!"}],
    max_tokens=512
)

print(response.choices[0].message.content)
```

## Links

- Website: https://oxlo.ai
- Product Dashboard: https://portal.oxlo.ai
- Documentation: https://docs.oxlo.ai
- Pricing: https://oxlo.ai/pricing
- Models: https://oxlo.ai/models
- Contact: hello@oxlo.ai