v1

API Documentation

TokenLake provides an OpenAI-compatible API, letting you use existing OpenAI SDKs and tools without modification.

Authentication

All requests must include a Bearer token in the Authorization header.

http
Authorization: Bearer sk-th-your-api-key

Base URL

url
https://api.tokenlake.ai/v1

Endpoints

POST
/v1/chat/completions

Chat Completions — supports streaming

GET
/v1/models

List available models

Code Examples

Python

python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.tokenlake.ai/v1",
    api_key="sk-th-your-api-key",
)

response = client.chat.completions.create(
    model="qwen3-8b",
    messages=[{"role": "user", "content": "Hello!"}],
    stream=False,
    temperature=1.0,
    max_tokens=1024,
)

print(response.choices[0].message.content)

Node.js

javascript
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.tokenlake.ai/v1",
  apiKey: "sk-th-your-api-key",
});

const response = await client.chat.completions.create({
  model: "qwen3-8b",
  messages: [{ role: "user", content: "Hello!" }],
  stream: false,
  temperature: 1.0,
  max_tokens: 1024,
});

console.log(response.choices[0].message.content);

cURL

bash
curl https://api.tokenlake.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-th-your-api-key" \
  -d '{
    "model": "qwen3-8b",
    "messages": [{"role": "user", "content": "Hello!"}],
    "temperature": 1.0,
    "max_tokens": 1024
  }'

Streaming (Python)

python
# Streaming example
for chunk in client.chat.completions.create(
    model="qwen3-8b",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True,
):
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Parameters

ParameterTypeRequiredDescription
modelstringrequiredThe model to use (e.g. qwen3-8b, gemma-3-9b)
messagesarrayrequiredArray of message objects with role and content
streambooleanoptionalEnable streaming responses (Server-Sent Events)
temperaturenumberoptionalSampling temperature, 0–2 (default: 1)
max_tokensintegeroptionalMaximum tokens to generate

Response Format

All responses follow the standard OpenAI response format, including streaming delta events.

json
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1714000000,
  "model": "qwen3-8b",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 12,
    "total_tokens": 22
  }
}

Error Codes

CodeDescription
401Unauthorized — invalid or missing API key
402Payment Required — insufficient balance
404Not Found — model or endpoint does not exist
429Too Many Requests — rate limit exceeded
503Service Unavailable — upstream model is down

Rate Limiting

Requests are limited per API key per minute. If you exceed the limit, you will receive a 429 response. Contact support to increase your limits.

SDK Support

TokenLake is fully compatible with the official OpenAI SDKs. Simply set the base URL and your API key.

OpenAI Python SDK

bash
pip install openai

OpenAI Node.js SDK

bash
npm install openai