POST /v1/chat/completions
The main inference endpoint. Send a list of messages, get back a model response. OpenAI-compatible \u2014 most existing SDKs work without changes.
Endpoint
http
POST /v1/chat/completionsRequest body
| Field | Type | Required | Description |
|---|---|---|---|
model | string | yes | Model id, e.g. gpt-4o-mini. See /v1/models. |
messages | array | yes | List of message objects with role and content. |
temperature | number | no | 0 – 2. Default depends on model. |
max_tokens | integer | no | Cap on generated tokens. |
stream | boolean | no | If true, returns server-sent events. |
tools | array | no | OpenAI-style function/tool definitions. |
response_format | object | no | Use { type: "json_object" } for guaranteed JSON. |
Example request
bash
curl https://api.foxora.ai/v1/chat/completions \
-H "Authorization: Bearer $FOXORA_API_KEY" \
-H "Content: application/json" \
-d '{
"model": "gpt",
"messages": [
{ "role": "system", "content": "You are a concise assistant." },
{ "role": "user", "content": "Summarise SOLID in one paragraph." }
],
"temperature": 0.4
}'Example response
json
{
"id": "cmpl_8YqW...",
"object": "chat.completion",
"created": 1712345678,
"model": "gpt-4o-mini",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "SOLID is a set of five design principles..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 28,
"completion_tokens": 84,
"total_tokens": 112
}
}Streaming
Set stream: true to receive server-sent events. Each event is a JSON object with a partial delta; the final event is the literal string [DONE].
streaming · javascript
import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.FOXORA_API_KEY,
baseURL: "https://api.foxora.ai/v1",
});
const stream = await client.chat.completions.create({
model: "gpt-4o-mini",
stream: true,
messages: [{ role: "user", content: "Count to five." }],
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}Tool calls
Pass a list of tools and Foxora will return a tool_calls array when the model decides to invoke one. You execute the tool yourself and feed the result back as a role: "tool" message in the next call.
Behaviour matches OpenAI
Tool-call semantics are identical to OpenAI’s. Their tool-calling docs apply 1:1 — just swap the base URL.