Model Images

Model images package both an inference runtime (e.g., llama.cpp) and a specific model into a single container image. They’re ideal for quick starts, demos, single‑purpose services, and environments where simplicity is preferred over component isolation.

When to use model images

Fastest way to get an endpoint running
Minimal choices: no need to choose a runtime or mount model files
Great for laptops, POCs, and small dedicated services

If you need stronger isolation or to manage model files independently, see /pages/artifacts/runtime and /pages/artifacts/model.

Quick start

docker pull rlcr.io/ramalama/gemma3-270m:latest
docker run --rm -p 8080:8080 rlcr.io/ramalama/gemma3-270m:latest

Test the OpenAI‑compatible API:

curl -s http://localhost:8080/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{"model":"gemma3-270m","messages":[{"role":"user","content":"Say hello in one sentence"}]}'

You can find the full catalogue of RamaLama Labs images here

Compose

docker-compose.yaml

services:
  ai:
    image: rlcr.io/ramalama/gemma3-270m:latest
    ports:
      - "8080:8080"
    restart: unless-stopped

Notes on updates, tags, and hardware

Examples use :latest; pin tags in production for repeatability
Images are rebuilt and scanned regularly for security and performance
Hardware acceleration is chosen by the underlying image; for advanced control, use runtimes directly

Getting Started

Quick Start

Deploying

Artifact Types

Education

When to use model images

Quick start

Compose

Notes on updates, tags, and hardware

See also

Getting Started

Quick Start

Deploying

Artifact Types

Education

​When to use model images

​Quick start

​Compose

​Notes on updates, tags, and hardware

​See also

When to use model images

Quick start

Compose

Notes on updates, tags, and hardware

See also