Runtimes

RamaLama runtime images are minimal, security‑hardened containers that package an inference engine without any model files. Use them when you want to manage models separately (versioning, provenance, air‑gapped environments) or need fine‑grained control over mounts and updates.

When to use runtimes

Isolate the execution environment from model content for stricter change control
Update model files without rebuilding container images
Pin/roll back runtime versions independently of models
Support multiple models on the same host via mounts

Supported flavors

Common runtime images include:

rlcr.io/ramalama/llamacpp-cpu-distroless:latest — CPU‑only
rlcr.io/ramalama/llamacpp-cuda-distroless:latest — NVIDIA CUDA
- Requires NVIDIA Container Toolkit when using Docker

Additional hardware variants may be available (e.g., ROCm, Intel GPU). Check the registry for your hardware.

For NVIDIA + Docker, install the NVIDIA Container Toolkit before running GPU containers.

Run with a local model directory

Mount a directory containing your .gguf model and point the runtime to the file with --model.

docker run --rm -p 8080:8080 \
  -v "$PWD/models:/models:ro" \
  rlcr.io/ramalama/llamacpp-cpu-distroless:latest \
  --model /models/gemma-3-270m-it-Q6_K.gguf --host 0.0.0.0 --port 8080

Compose example

Define the runtime service and mount your model directory read‑only at /models.

docker-compose.yaml (CPU)

services:
  llama:
    image: rlcr.io/ramalama/llamacpp-cpu-distroless:latest
    command: ["--model", "/models/gemma-3-270m-it-Q6_K.gguf", "--host", "0.0.0.0", "--port", "8080"]
    volumes:
      - ./models:/models:ro
    ports:
      - "8080:8080"
    restart: unless-stopped

docker-compose.yaml (CUDA)

services:
  llama-gpu:
    image: rlcr.io/ramalama/llamacpp-cuda-distroless:latest
    command: ["--model", "/models/gemma-3-270m-it-Q6_K.gguf", "--host", "0.0.0.0", "--port", "8080"]
    volumes:
      - ./models:/models:ro
    ports:
      - "8080:8080"
    gpus: all
    restart: unless-stopped

RamaLama CLI (override image)

The CLI auto‑detects your hardware and chooses an image, but you can override it explicitly:

ramalama serve --image rlcr://llamacpp-cuda-distroless:latest rlcr://gemma3-270m

Next steps

See deployment patterns: /pages/deploying/compose
Learn about OCI‑packaged models: /pages/artifacts/model

Getting Started

Quick Start

Deploying

Artifact Types

Education

When to use runtimes

Supported flavors

Run with a local model directory

Compose example

RamaLama CLI (override image)

Next steps

Getting Started

Quick Start

Deploying

Artifact Types

Education

​When to use runtimes

​Supported flavors

​Run with a local model directory

​Compose example

​RamaLama CLI (override image)

​Next steps

When to use runtimes

Supported flavors

Run with a local model directory

Compose example

RamaLama CLI (override image)

Next steps