Quick start

The SDK spins up a local model server and lets you chat with it using a simple API.

Run a model

Context Managers

The context manager will automatically manage and clean up running models on your behalf.

from ramalama_sdk import RamalamaModel

with RamalamaModel(model="tinyllama") as model:
    response = model.chat("How tall is Michael Jordan?")
    print(response["content"])

Manual Management

It’s also possible to manually manage the models run state.

Manual lifecycle

from ramalama_sdk import RamalamaModel

model_name = "tinyllama"
model = RamalamaModel(model=model_name)
model.download()
model.serve()

Once the model is serving, you can call the local OpenAI-compatible endpoint yourself.

try:
    response = model.chat("How tall is Michael Jordan?")
    print(response["content"])
finally:
    model.stop()

Download models

Use download() to fetch and cache models before serving. The model identifier controls where the SDK pulls from. Common prefixes include

HuggingFace: hf://
Ollama: ollama://
OCI (any oci image repository): oci://
ModelScope: modelscope://
File: file://

from ramalama_sdk import RamalamaModel

model = RamalamaModel(model="hf://ggml-org/gpt-oss-20b-GGUF")
model.download()

Instantiating a model

You can pass runtime overrides when creating a model session:

from ramalama_sdk import RamalamaModel

model = RamalamaModel(
    model="tinyllama",
    base_image=None,
    temp=0.7,
    ngl=20,
    max_tokens=256,
    threads=8,
    ctx_size=4096,
    timeout=30,
)

Parameter	Type	Description	Default
model	str	Model name or identifier.	required
base_image	str or None	Container image to use for serving, if different from config.	None
temp	float or None	Temperature override for sampling.	None
ngl	int or None	GPU layers override.	None
max_tokens	int or None	Maximum tokens for completions.	None
threads	int or None	CPU threads override.	None
ctx_size	int or None	Context window override.	None
timeout	int	Seconds to wait for server readiness.	30

Getting Started

Python

Planned SDKs

Run a model

Context Managers

Manual Management

Download models

Instantiating a model

Getting Started

Python

Planned SDKs

​Run a model

​Context Managers

​Manual Management

​Download models

​Instantiating a model

Run a model

Context Managers

Manual Management

Download models

Instantiating a model