PeerLLM Developer Guide

OpenAI-compatible REST API — use any OpenAI SDK or HTTP client

← Back to PeerLLM

Overview

PeerLLM exposes an OpenAI-compatible REST API so you can use any OpenAI SDK or HTTP client to interact with the PeerLLM network. Requests are routed to distributed community hosts running open-weight models, or to LLooMA1.0 — PeerLLM's network-native orchestration model that splits complex tasks across multiple hosts in parallel.

Base URL

https://api.peerllm.com

All endpoints are prefixed with /v1.

Authentication

Every request must include a valid API key in the Authorization header:

Authorization: Bearer <YOUR_API_KEY>

Generating an API Key

  1. Go to the Hosts Portal at hosts.peerllm.com.
  2. Sign in or create an account.
  3. Navigate to the API Management section.
  4. Click Generate New Key.
  5. Copy and securely store your key — it will not be shown again.
Important: Treat your API key like a password. Do not commit it to source control or share it publicly.

Tokens & Billing

PeerLLM uses a token balance system. You must have a positive token balance before you can make API calls.

How to Get Tokens

MethodSteps
Purchase Go to hosts.peerllm.comAPI Management → click Purchase Tokens and select a package and complete payment.
Redeem a Token Code Go to hosts.peerllm.comAPI ManagementRedeem Code → enter your code.

If your balance reaches zero, all /v1/chat/completions requests will be rejected with a 402 Payment Required error until you add more tokens.

Endpoints

GET /v1/models

Returns the list of all approved models currently available on the PeerLLM network.

Note: This endpoint does not require authentication.

Request

GET /v1/models HTTP/1.1
Host: api.peerllm.com

Response 200 OK

{
  "object": "list",
  "data": [
    {
      "id": "LLooMA1.0",
      "object": "model",
      "created": 1776572039,
      "owned_by": "peerllm",
      "metadata": {
        "source": "orchestration",
        "repo": null,
        "file": null,
        "qtype": null,
        "size": null,
        "ram": null,
        "gpu": null,
        "checksum": null,
        "description": "LLooMA Orchestration Model - Centralized AI orchestration with distributed execution across PeerLLM hosts. Automatically splits complex tasks into parallel subtasks for optimal performance."
      }
    },
    {
      "id": "mistral-7b-instruct-v0.2.Q8_0",
      "object": "model",
      "created": 1776572039,
      "owned_by": "peerllm",
      "metadata": {
        "source": "huggingface",
        "repo": "TheBloke/Mistral-7B-Instruct-v0.2-GGUF",
        "file": "mistral-7b-instruct-v0.2.Q8_0.gguf",
        "qtype": "Q8_0",
        "size": 7241732096,
        "ram": "16GB+",
        "gpu": "RTX 4060 or higher",
        "checksum": "sha256:3a6fbf4a41a1d52e415a4958cde6856d34b2db93",
        "description": "Quantized Mistral 7B Instruct v0.2 model hosted by TheBloke. Improved reasoning, context depth (32K tokens), and conversational performance."
      }
    }
  ]
}

Model Metadata Fields

FieldTypeDescription
idstringThe model identifier — use this value in the model field of chat completions.
objectstringAlways "model".
createdintegerUnix timestamp.
owned_bystring "owner of model".
metadata.sourcestringOrigin of the model ("huggingface", "orchestration", etc.).
metadata.repostring?Source repository (e.g., HuggingFace repo).
metadata.filestring?Model filename.
metadata.qtypestring?Quantization type (e.g., Q8_0, Q4_K_M).
metadata.sizeinteger?Model file size in bytes.
metadata.ramstring?Recommended RAM.
metadata.gpustring?GPU requirement.
metadata.checksumstring?File checksum for integrity verification.
metadata.descriptionstring?Human-readable model description.

POST /v1/chat/completions

Send a chat completion request. Supports both streaming (stream: true, the default) and non-streaming (stream: false) modes.

Request

POST /v1/chat/completions HTTP/1.1
Host: api.peerllm.com
Authorization: Bearer <YOUR_API_KEY>
Content-Type: application/json

{
  "model": "LLooMA1.0",
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    { "role": "user", "content": "Explain quantum computing in simple terms." }
  ],
  "stream": true,
  "temperature": 0.7,
  "max_tokens": 1024
}

Request Body

FieldTypeRequiredDefaultDescription
modelstringYesModel ID from /v1/models. Use "LLooMA1.0" for orchestrated multi-host responses.
messagesarrayYesArray of message objects with role ("system", "user", "assistant") and content.
streambooleanNotrueWhether to stream the response via SSE.
temperaturenumberNonullSampling temperature (0.0–2.0).
max_tokensnumberNonullMaximum tokens to generate.

Non-Streaming Response 200 OK

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1750000000,
  "model": "LLooMA1.0",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Quantum computing uses qubits..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 150,
    "total_tokens": 175
  }
}

Streaming Response 200 OK

The response is sent as Server-Sent Events (text/event-stream). Each event contains a JSON chunk:

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1750000000,"model":"LLooMA1.0","choices":[{"index":0,"delta":{"role":"assistant","content":"Quantum"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1750000000,"model":"LLooMA1.0","choices":[{"index":0,"delta":{"content":" computing"},"finish_reason":null}]}

...

data: [DONE]

About LLooMA1.0

LLooMA1.0 is a network-native orchestration model. It does not run on any single host. Instead, it:

  1. Analyzes your prompt and determines if it can be split into parallel subtasks.
  2. If splittable, it distributes subtasks across multiple PeerLLM hosts and synthesizes the results.
  3. If not splittable, it races two PeerLLM hosts and uses the fastest response. Strict latency enforcement (1 s first-token deadline, 1 s inter-token timeout, 5 s total cap) ensures quality — if hosts are slow, a centralized AI fallback kicks in automatically.

For all other model IDs, the request is sent directly to a host running that specific model.

Error Reference

All errors follow a consistent JSON structure:

{
  "error": "Error message string"
}

Or in some cases the OpenAI-style structured format:

{
  "error": {
    "message": "Detailed error message",
    "type": "error_type",
    "code": "error_code"
  }
}
HTTP StatusErrorCauseResolution
400 "Missing model name." The model field is empty or missing. Provide a valid model ID.
400 "Unknown model '{model}'." The requested model is not in the PeerLLM catalog. Does not apply to LLooMA1.0. Call GET /v1/models to see available models.
400 "Missing messages." The messages array is null or empty. Provide at least one message with role "user".
401 "Invalid or expired API key." The Authorization header is missing or the key is invalid/expired. Generate a new API key at hosts.peerllm.com.
402 "Insufficient token balance." Your account's token balance is zero or negative. Purchase tokens or redeem a code at hosts.peerllm.com.
404 "No hosts available." No PeerLLM hosts are online for the requested model. Try again later, or use LLooMA1.0 which has centralized AI fallback.
500 "Failed to process request with centralized AI." The centralized AI fallback encountered an internal error. Retry the request. If persistent, check PeerLLM status.

Quick Start

Using curl

# List available models
curl https://api.peerllm.com/v1/models

# Chat completion (streaming)
curl https://api.peerllm.com/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "LLooMA1.0",
    "messages": [{"role": "user", "content": "Hello!"}],
    "stream": true
  }'

Using an OpenAI SDK

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.peerllm.com/v1"
)

response = client.chat.completions.create(
    model="LLooMA1.0",
    messages=[
        {"role": "user", "content": "What is PeerLLM?"}
    ],
    stream=True
)

for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "YOUR_API_KEY",
  baseURL: "https://api.peerllm.com/v1",
});

const stream = await client.chat.completions.create({
  model: "LLooMA1.0",
  messages: [{ role: "user", content: "What is PeerLLM?" }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || "");
}
using OpenAI;
using OpenAI.Chat;

var client = new ChatClient(
    model: "LLooMA1.0",
    credential: new ApiKeyCredential("YOUR_API_KEY"),
    options: new OpenAIClientOptions
    {
        Endpoint = new Uri("https://api.peerllm.com/v1")
    });

var stream = client.CompleteChatStreamingAsync(
    new List<ChatMessage>
    {
        new UserChatMessage("What is PeerLLM?")
    });

await foreach (var update in stream)
{
    foreach (var part in update.ContentUpdate)
    {
        Console.Write(part.Text);
    }
}

Support

For questions, issues, or feedback, visit the Hosts Portal at hosts.peerllm.com.