API Documentation

TokenLake provides an OpenAI-compatible API, letting you use existing OpenAI SDKs and tools without modification.

Authentication

All requests must include a Bearer token in the Authorization header.

http

Authorization: Bearer sk-th-your-api-key

Base URL

url

https://api.tokenlake.ai/v1

Endpoints

POST

/v1/chat/completions

Chat Completions — supports streaming

GET

/v1/models

List available models

Code Examples

Python

python

from openai import OpenAI

client = OpenAI(
    base_url="https://api.tokenlake.ai/v1",
    api_key="sk-th-your-api-key",
)

response = client.chat.completions.create(
    model="qwen3-8b",
    messages=[{"role": "user", "content": "Hello!"}],
    stream=False,
    temperature=1.0,
    max_tokens=1024,
)

print(response.choices[0].message.content)

Node.js

javascript

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.tokenlake.ai/v1",
  apiKey: "sk-th-your-api-key",
});

const response = await client.chat.completions.create({
  model: "qwen3-8b",
  messages: [{ role: "user", content: "Hello!" }],
  stream: false,
  temperature: 1.0,
  max_tokens: 1024,
});

console.log(response.choices[0].message.content);

cURL

bash

curl https://api.tokenlake.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-th-your-api-key" \
  -d '{
    "model": "qwen3-8b",
    "messages": [{"role": "user", "content": "Hello!"}],
    "temperature": 1.0,
    "max_tokens": 1024
  }'

Streaming (Python)

python

# Streaming example
for chunk in client.chat.completions.create(
    model="qwen3-8b",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True,
):
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Parameters

Parameter	Type	Required	Description
model	string	required	The model to use (e.g. qwen3-8b, gemma-3-9b)
messages	array	required	Array of message objects with role and content
stream	boolean	optional	Enable streaming responses (Server-Sent Events)
temperature	number	optional	Sampling temperature, 0–2 (default: 1)
max_tokens	integer	optional	Maximum tokens to generate

Response Format

All responses follow the standard OpenAI response format, including streaming delta events.

json

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1714000000,
  "model": "qwen3-8b",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 12,
    "total_tokens": 22
  }
}

Error Codes

Code	Description
401	Unauthorized — invalid or missing API key
402	Payment Required — insufficient balance
404	Not Found — model or endpoint does not exist
429	Too Many Requests — rate limit exceeded
503	Service Unavailable — upstream model is down

Rate Limiting

Requests are limited per API key per minute. If you exceed the limit, you will receive a 429 response. Contact support to increase your limits.

SDK Support

TokenLake is fully compatible with the official OpenAI SDKs. Simply set the base URL and your API key.

OpenAI Python SDK

bash

pip install openai

OpenAI Node.js SDK

bash

npm install openai