Developers - PeerLLM

← Back to PeerLLM

Overview

PeerLLM exposes an OpenAI-compatible REST API so you can use any OpenAI SDK or HTTP client to interact with the PeerLLM network. Requests are routed to distributed community hosts running open-weight models, or to LLooMA1.0 — PeerLLM's network-native orchestration model that splits complex tasks across multiple hosts in parallel.

You can access PeerLLM in three ways:

Remote API — Connect to the public PeerLLM cloud at https://api.peerllm.com
Network API — Connect to a PeerLLM host on your local network
Local Development — Connect to a PeerLLM host running on your own machine

Remote API

The Remote API connects you to the public PeerLLM cloud. This is the recommended way to use PeerLLM for production applications.

Base URL

https://api.peerllm.com

All endpoints are prefixed with /v1.

Authentication

Every request must include a valid API key in the Authorization header:

Authorization: Bearer <YOUR_API_KEY>

Generating an API Key

Go to the Hosts Portal at hosts.peerllm.com.
Sign in or create an account.
Navigate to the API Management section.
Click Generate New Key.
Copy and securely store your key — it will not be shown again.

Important: Treat your API key like a password. Do not commit it to source control or share it publicly.

Tokens & Billing

PeerLLM uses a token balance system. You must have a positive token balance before you can make API calls.

How to Get Tokens

Method	Steps
Purchase	Go to hosts.peerllm.com → API Management → click Purchase Tokens and select a package and complete payment.
Redeem a Token Code	Go to hosts.peerllm.com → API Management → Redeem Code → enter your code.

If your balance reaches zero, all /v1/chat/completions requests will be rejected with a 402 Payment Required error until you add more tokens.

Endpoints

GET /v1/models

Returns the list of all approved models currently available on the PeerLLM network.

Note: This endpoint does not require authentication.

Request

GET /v1/models HTTP/1.1
Host: api.peerllm.com

Response `200 OK`

{
  "object": "list",
  "data": [
    {
      "id": "LLooMA1.0",
      "object": "model",
      "created": 1776572039,
      "owned_by": "peerllm",
      "metadata": {
        "source": "orchestration",
        "repo": null,
        "file": null,
        "qtype": null,
        "size": null,
        "ram": null,
        "gpu": null,
        "checksum": null,
        "description": "LLooMA Orchestration Model - Centralized AI orchestration with distributed execution across PeerLLM hosts. Automatically splits complex tasks into parallel subtasks for optimal performance."
      }
    },
    {
      "id": "mistral-7b-instruct-v0.2.Q8_0",
      "object": "model",
      "created": 1776572039,
      "owned_by": "peerllm",
      "metadata": {
        "source": "huggingface",
        "repo": "TheBloke/Mistral-7B-Instruct-v0.2-GGUF",
        "file": "mistral-7b-instruct-v0.2.Q8_0.gguf",
        "qtype": "Q8_0",
        "size": 7241732096,
        "ram": "16GB+",
        "gpu": "RTX 4060 or higher",
        "checksum": "sha256:3a6fbf4a41a1d52e415a4958cde6856d34b2db93",
        "description": "Quantized Mistral 7B Instruct v0.2 model hosted by TheBloke. Improved reasoning, context depth (32K tokens), and conversational performance."
      }
    }
  ]
}

Model Metadata Fields

Field	Type	Description
`id`	string	The model identifier — use this value in the `model` field of chat completions.
`object`	string	Always `"model"`.
`created`	integer	Unix timestamp.
`owned_by`	string	`"owner of model"`.
`metadata.source`	string	Origin of the model (`"huggingface"`, `"orchestration"`, etc.).
`metadata.repo`	string?	Source repository (e.g., HuggingFace repo).
`metadata.file`	string?	Model filename.
`metadata.qtype`	string?	Quantization type (e.g., `Q8_0`, `Q4_K_M`).
`metadata.size`	integer?	Model file size in bytes.
`metadata.ram`	string?	Recommended RAM.
`metadata.gpu`	string?	GPU requirement.
`metadata.checksum`	string?	File checksum for integrity verification.
`metadata.description`	string?	Human-readable model description.

POST /v1/chat/completions

Send a chat completion request. Supports both streaming (stream: true, the default) and non-streaming (stream: false) modes.

Request

POST /v1/chat/completions HTTP/1.1
Host: api.peerllm.com
Authorization: Bearer <YOUR_API_KEY>
Content-Type: application/json

{
  "model": "LLooMA1.0",
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    { "role": "user", "content": "Explain quantum computing in simple terms." }
  ],
  "stream": true,
  "temperature": 0.7,
  "max_tokens": 1024
}

Request Body

Field	Type	Required	Default	Description
`model`	string	Yes	—	Model ID from `/v1/models`. Use `"LLooMA1.0"` for orchestrated multi-host responses.
`messages`	array	Yes	—	Array of message objects with `role` (`"system"`, `"user"`, `"assistant"`) and `content`.
`stream`	boolean	No	`true`	Whether to stream the response via SSE.
`temperature`	number	No	`null`	Sampling temperature (0.0–2.0).
`max_tokens`	number	No	`null`	Maximum tokens to generate.

Non-Streaming Response `200 OK`

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1750000000,
  "model": "LLooMA1.0",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Quantum computing uses qubits..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 150,
    "total_tokens": 175
  }
}

Streaming Response `200 OK`

The response is sent as Server-Sent Events (text/event-stream). Each event contains a JSON chunk:

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1750000000,"model":"LLooMA1.0","choices":[{"index":0,"delta":{"role":"assistant","content":"Quantum"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1750000000,"model":"LLooMA1.0","choices":[{"index":0,"delta":{"content":" computing"},"finish_reason":null}]}

...

data: [DONE]

About LLooMA1.0

LLooMA1.0 is a network-native orchestration model. It does not run on any single host. Instead, it:

Analyzes your prompt and determines if it can be split into parallel subtasks.
If splittable, it distributes subtasks across multiple PeerLLM hosts and synthesizes the results.
If not splittable, it races two PeerLLM hosts and uses the fastest response. Strict latency enforcement (1 s first-token deadline, 1 s inter-token timeout, 5 s total cap) ensures quality — if hosts are slow, a centralized AI fallback kicks in automatically.

For all other model IDs, the request is sent directly to a host running that specific model.

Error Reference

All errors follow a consistent JSON structure:

{
  "error": "Error message string"
}

Or in some cases the OpenAI-style structured format:

{
  "error": {
    "message": "Detailed error message",
    "type": "error_type",
    "code": "error_code"
  }
}

HTTP Status	Error	Cause	Resolution
400	`"Missing model name."`	The `model` field is empty or missing.	Provide a valid model ID.
400	`"Unknown model '{model}'."`	The requested model is not in the PeerLLM catalog. Does not apply to `LLooMA1.0`.	Call `GET /v1/models` to see available models.
400	`"Missing messages."`	The `messages` array is null or empty.	Provide at least one message with role `"user"`.
401	`"Invalid or expired API key."`	The `Authorization` header is missing or the key is invalid/expired.	Generate a new API key at hosts.peerllm.com.
402	`"Insufficient token balance."`	Your account's token balance is zero or negative.	Purchase tokens or redeem a code at hosts.peerllm.com.
404	`"No hosts available."`	No PeerLLM hosts are online for the requested model.	Try again later, or use `LLooMA1.0` which has centralized AI fallback.
500	`"Failed to process request with centralized AI."`	The centralized AI fallback encountered an internal error.	Retry the request. If persistent, check PeerLLM status.

Quick Start

Using curl

# List available models
curl https://api.peerllm.com/v1/models

# Chat completion (streaming)
curl https://api.peerllm.com/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "LLooMA1.0",
    "messages": [{"role": "user", "content": "Hello!"}],
    "stream": true
  }'

Using an OpenAI SDK

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.peerllm.com/v1"
)

response = client.chat.completions.create(
    model="LLooMA1.0",
    messages=[
        {"role": "user", "content": "What is PeerLLM?"}
    ],
    stream=True
)

for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "YOUR_API_KEY",
  baseURL: "https://api.peerllm.com/v1",
});

const stream = await client.chat.completions.create({
  model: "LLooMA1.0",
  messages: [{ role: "user", content: "What is PeerLLM?" }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || "");
}

using OpenAI;
using OpenAI.Chat;

var client = new ChatClient(
    model: "LLooMA1.0",
    credential: new ApiKeyCredential("YOUR_API_KEY"),
    options: new OpenAIClientOptions
    {
        Endpoint = new Uri("https://api.peerllm.com/v1")
    });

var stream = client.CompleteChatStreamingAsync(
    new List<ChatMessage>
    {
        new UserChatMessage("What is PeerLLM?")
    });

await foreach (var update in stream)
{
    foreach (var part in update.ContentUpdate)
    {
        Console.Write(part.Text);
    }
}

using System.Text;
using System.Threading.Tasks;
using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.ChatCompletion;

namespace PeerLLM.Demos.SemanticKernel
{
    internal class Program
    {
        static async Task Main(string[] args)
        {
            var builder = Kernel.CreateBuilder();

            builder.AddOpenAIChatCompletion(
                modelId: "LLooMA1.0",
                apiKey: "YOUR_API_KEY",
                endpoint: new Uri("https://api.peerllm.com/v1"));

            var kernel = builder.Build();
            var chat = kernel.GetRequiredService<IChatCompletionService>();
            var chatHistory = new ChatHistory();
            var fullResponse = new StringBuilder();

            chatHistory.AddUserMessage("What is PeerLLM?");

            await foreach (var message in chat.GetStreamingChatMessageContentsAsync(
                chatHistory,
                kernel: kernel))
            {
                Console.Write(message.ToString());
                fullResponse.Append(message.ToString());
            }
        }
    }
}

Network API

The Network API allows you to connect to a PeerLLM host running on your local network. This is useful for:

Accessing models running on a dedicated machine in your home or office
Leveraging powerful hardware without cloud costs
Keeping data local and private

Enable Server Capabilities

Open the PeerLLM application on the host machine.
Navigate to Server.
Click Start Server
Note the port number (default is usually 3000 or similar).

Generate API Key

In the PeerLLM application, go to Settings → API Management.
Click Generate New Key.
Copy and securely store your key — you'll need it to authenticate requests.

Security Note: Make sure your network is secure and trusted. Exposing the API server to untrusted networks can pose security risks.

Find Your IP Address

To connect to the host from another device on your network, you need the host's local IP address.

On macOS/Linux

ifconfig | grep "inet " | grep -v 127.0.0.1

Look for an address like 192.168.1.x or 10.0.0.x.

On Windows

ipconfig

Look for the IPv4 Address under your active network adapter (e.g., 192.168.1.x).

Usage

Once you have the IP address and port, construct the base URL:

http://<HOST_IP>:<PORT>

For example:

http://192.168.1.100:3000

Then use it with any OpenAI SDK or HTTP client:

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="http://192.168.1.100:3000/v1"
)

response = client.chat.completions.create(
    model="LLooMA1.0",
    messages=[
        {"role": "user", "content": "Hello from my network!"}
    ],
    stream=True
)

for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "YOUR_API_KEY",
  baseURL: "http://192.168.1.100:3000/v1",
});

const stream = await client.chat.completions.create({
  model: "LLooMA1.0",
  messages: [{ role: "user", content: "Hello from my network!" }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || "");
}

using System.Text;
using System.Threading.Tasks;
using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.ChatCompletion;

namespace PeerLLM.Demos.SemanticKernel
{
    internal class Program
    {
        static async Task Main(string[] args)
        {
            var builder = Kernel.CreateBuilder();

            builder.AddOpenAIChatCompletion(
                modelId: "LLooMA1.0",
                apiKey: "YOUR_API_KEY",
                endpoint: new Uri("http://192.168.1.100:3000/v1"));

            var kernel = builder.Build();
            var chat = kernel.GetRequiredService<IChatCompletionService>();
            var chatHistory = new ChatHistory();
            var fullResponse = new StringBuilder();

            chatHistory.AddUserMessage("Hello from my network!");

            await foreach (var message in chat.GetStreamingChatMessageContentsAsync(
                chatHistory,
                kernel: kernel))
            {
                Console.Write(message.ToString());
                fullResponse.Append(message.ToString());
            }
        }
    }
}

curl http://192.168.1.100:3000/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "LLooMA1.0",
    "messages": [{"role": "user", "content": "Hello from my network!"}],
    "stream": true
  }'

Local Development

The Local Development setup allows you to connect to PeerLLM running on the same machine. This is ideal for:

Testing and development
Offline work
Maximum privacy and speed

The setup is identical to the Network API section above, but instead of using your machine's IP address, you'll use localhost or 0.0.0.0 with the configured port number.

Usage

Use localhost with the port number:

http://localhost:<PORT>

For example:

http://localhost:3000

Note: Some machines may require http://0.0.0.0:<PORT> instead of localhost.

Then use it with any OpenAI SDK or HTTP client:

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="http://localhost:3000/v1"
)

response = client.chat.completions.create(
    model="LLooMA1.0",
    messages=[
        {"role": "user", "content": "Hello locally!"}
    ],
    stream=True
)

for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "YOUR_API_KEY",
  baseURL: "http://localhost:3000/v1",
});

const stream = await client.chat.completions.create({
  model: "LLooMA1.0",
  messages: [{ role: "user", content: "Hello locally!" }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || "");
}

using System.Text;
using System.Threading.Tasks;
using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.ChatCompletion;

namespace PeerLLM.Demos.SemanticKernel
{
    internal class Program
    {
        static async Task Main(string[] args)
        {
            var builder = Kernel.CreateBuilder();

            builder.AddOpenAIChatCompletion(
                modelId: "LLooMA1.0",
                apiKey: "YOUR_API_KEY",
                endpoint: new Uri("http://localhost:3000/v1"));

            var kernel = builder.Build();
            var chat = kernel.GetRequiredService<IChatCompletionService>();
            var chatHistory = new ChatHistory();
            var fullResponse = new StringBuilder();

            chatHistory.AddUserMessage("Hello locally!");

            await foreach (var message in chat.GetStreamingChatMessageContentsAsync(
                chatHistory,
                kernel: kernel))
            {
                Console.Write(message.ToString());
                fullResponse.Append(message.ToString());
            }
        }
    }
}

curl http://localhost:3000/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "LLooMA1.0",
    "messages": [{"role": "user", "content": "Hello locally!"}],
    "stream": true
  }'

Support

For questions, issues, or feedback, visit the Hosts Portal at hosts.peerllm.com.

Overview

Remote API

Base URL

Authentication

Generating an API Key

Tokens & Billing

How to Get Tokens

Endpoints

GET /v1/models

Request

Response 200 OK

Model Metadata Fields

POST /v1/chat/completions

Request

Request Body

Non-Streaming Response 200 OK

Streaming Response 200 OK

About LLooMA1.0

Error Reference

Quick Start

Using curl

Using an OpenAI SDK

Network API

Enable Server Capabilities

Generate API Key

Find Your IP Address

On macOS/Linux

On Windows

Usage

Local Development

Usage

Support

Response `200 OK`

Non-Streaming Response `200 OK`

Streaming Response `200 OK`