Skip to main content

Adding open-source models via API

This guide explains how to register and deploy an open-source model (such as one from Hugging Face) in the Rational AI Control Room through the Management API. The process consists of two steps:

  1. Register the model — creates a catalog record pointing at a Hugging Face repo via huggingFaceId.
  2. Deploy the model — spins up the serving pod (consumes GPU) and exposes an in-cluster service URL.
POST /management/v0/models/register      ->  returns model id
POST /management/v0/models/{id}/deploy -> returns deploymentId

You can register a model without deploying it, and deploy whenever you’re ready.

The Management API (group backend-management-v0) is served under https://[your-domain].rational.is/api. You can browse it interactively in the Swagger UI at https://[your-domain].rational.is/api/swagger/index.html.

ℹ️REMEMBER

Replace [your-domain] with your own tenant identifier throughout this guide.

Prerequisites

Before you begin, ensure you have:

  • An authenticated session or API key with admin/management rights on the Control Room.
  • The exact Hugging Face repo ID of the model you want (e.g. google/gemma-4-12B-it). Verify it exists on huggingface.co before registering — the API stores the string as-is.
  • Available GPU capacity in the cluster if you intend to deploy immediately.
ℹ️AUTHENTICATION

All examples below assume you replace $TOKEN with a valid bearer token / API key. If you’re driving this from an authenticated browser session, the session cookie is used automatically and you can drop the Authorization header.

Step 1 — Check what’s already registered

Avoid duplicates by listing the current catalog first.

curl -s "https://[your-domain].rational.is/api/management/v0/models?page=0&size=50" \
-H "Accept: application/json" \
-H "Authorization: Bearer $TOKEN"

Each entry includes id, name, type, publisher, and huggingFaceId. If the model already appears, skip to Step 3 — Deploy the model using its existing id.

Step 2 — Register the model

POST /management/v0/models/register with a RegisterModelRequest body.

Required fields

FieldTypeNotes
namestringCatalog name. Convention: the HF repo ID, e.g. google/gemma-4-12B-it.
typeenumOne of base, awq, ggml, gguf. Use base for full-precision HF weights; use gguf/awq for quantized variants.
isFineTunedbooleanfalse for a stock open-source model.
FieldTypeNotes
huggingFaceIdstringThe HF repo ID. This is what makes it an open-source / HF model.
publisherstringe.g. google.
numberOfParamsstringParameter count as a string, e.g. "12000000000".
quantstringQuantization label, if applicable.
sizestringOn-disk size, if known.
descriptionstringFree text / model card summary.
carduriLink to the HF model card.
defaultParametersobjectModelDefaultParameters — context length, GPU offload, RoPE settings, etc.

Example

curl -s -X POST "https://[your-domain].rational.is/api/management/v0/models/register" \
-H "Content-Type: application/json" \
-H "Accept: application/json" \
-H "Authorization: Bearer $TOKEN" \
-d '{
"name": "google/gemma-4-12B-it",
"type": "base",
"isFineTuned": false,
"publisher": "google",
"numberOfParams": "12000000000",
"description": "Gemma 4 12B instruction-tuned",
"card": "https://huggingface.co/google/gemma-4-12B-it",
"huggingFaceId": "google/gemma-4-12B-it"
}'

Success: 201 Created, returns the full model record including its id (a UUID). Save that id.

⚠️409 CONFLICT

A model with that name already exists. List the catalog and reuse the existing id.

Step 3 — Deploy the model

POST /management/v0/models/{id}/deploy with a DeploymentParameters body. {id} is the model UUID from Step 2.

Body fields

FieldTypeRequiredNotes
forceDeploymentbooleanyestrue to proceed even if a deployment exists / to force a fresh rollout.
temperaturefloatnoDefault sampling temperature.
topPfloatnoNucleus sampling.
topKfloatnoTop-k sampling.
minPfloatnoMinimum-probability cutoff.
repetitionPenaltyfloatnoRepetition penalty.
maxOutputTokensintnoDefault max output tokens.
ℹ️REMEMBER

Existing deployments in this cluster also carry an adapterScale value (e.g. 1). Include "adapterScale": 1 to match the established pattern.

Example

curl -s -X POST "https://[your-domain].rational.is/api/management/v0/models/<MODEL_ID>/deploy" \
-H "Content-Type: application/json" \
-H "Accept: application/json" \
-H "Authorization: Bearer $TOKEN" \
-d '{
"forceDeployment": true,
"adapterScale": 1
}'

Success: 201 Created, returns the deploymentId as plain text, e.g. model-google-gemma-4-12-b-it-420f1c9176bc.

Step 4 — Verify deployment status

Fetch the model record and inspect its deployments array:

curl -s "https://[your-domain].rational.is/api/management/v0/models/<MODEL_ID>" \
-H "Accept: application/json" \
-H "Authorization: Bearer $TOKEN"

Each deployment shows:

  • deploymentId — the serving instance name.
  • status — provisioning lifecycle (e.g. waitingForChat once the pod is up and accepting traffic).
  • gpuCount — GPUs allocated.
  • serviceUrl — in-cluster endpoint, e.g. http://<deploymentId>.rational-ai.svc.cluster.local/v1/ (OpenAI-compatible /v1/ path).

You can also poll the dedicated status endpoint:

curl -s "https://[your-domain].rational.is/api/management/v0/models/<DEPLOYMENT_ID>/deployment-status" \
-H "Accept: application/json" \
-H "Authorization: Bearer $TOKEN"
ℹ️REMEMBER

Deployment takes a few minutes — the pod must pull the weights before it becomes ready.

Managing deployments

ActionEndpoint
List all deploymentsGET /management/v0/models/deployments
Pause a deploymentPOST /management/v0/models/{deploymentId}/pause
Cancel a deploymentPOST /management/v0/models/{deploymentId}/cancel
Update model recordPATCH /management/v0/models/{id}
Delete modelDELETE /management/v0/models/{id}

Choosing the right type

  • base — full-precision Hugging Face weights (FP16/BF16). Highest quality, highest VRAM. Use for standard *-it repos.
  • gguf — GGUF-quantized (llama.cpp family). Smaller footprint, runs on less VRAM. Point huggingFaceId at a GGUF repo.
  • awq — AWQ-quantized. Good quality/size tradeoff for supported architectures.
  • ggml — legacy GGML format.

Pick the type that matches the actual format of the HF repo you reference — the field tells the serving backend how to load the weights.

Troubleshooting

  • 404 Cannot POST ... — you hit the wrong origin/path. Calls must go to https://[your-domain].rational.is/api/.... If scripting from a browser, make sure the active page is on the Control Room origin so relative /api/... fetches resolve correctly.
  • 409 Conflict on register — name already taken; reuse the existing model id.
  • Deployment stuck before waitingForChat — the pod is still pulling weights or waiting on GPU scheduling. Check cluster GPU capacity; large models need proportionally more.
  • Architecture mismatch — the Control Room may record a generic architecture value on register. For newer/multimodal architectures, confirm the serving backend actually supports that model architecture before relying on non-text capabilities.

Quick reference — full flow

# 1. Register
MODEL_ID=$(curl -s -X POST "https://[your-domain].rational.is/api/management/v0/models/register" \
-H "Content-Type: application/json" -H "Authorization: Bearer $TOKEN" \
-d '{"name":"google/gemma-4-12B-it","type":"base","isFineTuned":false,
"publisher":"google","huggingFaceId":"google/gemma-4-12B-it",
"card":"https://huggingface.co/google/gemma-4-12B-it"}' \
| jq -r '.id')

# 2. Deploy
curl -s -X POST "https://[your-domain].rational.is/api/management/v0/models/$MODEL_ID/deploy" \
-H "Content-Type: application/json" -H "Authorization: Bearer $TOKEN" \
-d '{"forceDeployment":true,"adapterScale":1}'

# 3. Verify
curl -s "https://[your-domain].rational.is/api/management/v0/models/$MODEL_ID" \
-H "Authorization: Bearer $TOKEN" | jq '.deployments'