Docker Compose

Use Docker Compose to run RamaLama with either turnkey model images or a base runtime with a model mounted as a volume. All examples expose an OpenAI‑compatible API on port 8080 by default.

Production Deployments: runtime + model volume

Following this strategy you will deploy an isolated and hardened runtime image while mounting your desired models into the /models directory of the containers. This isolation allows you finer granularity in managing the lifecycle and deployment of your application.

Install ORAS (optional)

If you use RamaLama’s OCI‑packaged models, install a tool like ORAS to pull them locally. You can also use models from other providers (HuggingFace, Ollama, etc.).

brew install oras

Download the model to the node

With ORAS you can extract our models directly to your desired directory.

oras pull rlcr.io/ramalama/gemma3-270m:gguf -o ./models/

Create docker-compose.yaml

Define the runtime service and mount your model directory read‑only at /models.

services:
  llama:
    image: rlcr.io/ramalama/llamacpp-cpu-distroless:latest
    command: ["llama-server", "--model", "/models/gemma-3-270m-it-Q6_K.gguf", "--host", "0.0.0.0", "--port", "8080"]
    volumes:
      - ./models:/models:ro  # bind mount containing your .gguf
    ports:
      - "8080:8080"
    restart: unless-stopped

Start the stack

docker compose up -d

No Compose? You can run directly with Docker or Podman using the same volume mount.

docker run --rm -p 8080:8080 \
  -v "$PWD/models/gemma-3-1b-it:/models:ro" \
  rlcr.io/ramalama/llamacpp-cpu-distroless:latest \
  --model /models/gemma-3-1b-it-Q6_K.gguf --host 0.0.0.0 --port 8080

Podman: Image-as-volume

For podman users you can also mount a container image directly as a read‑only volume allowing us to bypass the need for a local models directory. We build mountable artifacts using the :<file_type>-image like :gguf-image tag structure.

podman run --rm -p 8080:8080 \
  --mount type=image,src=rlcr.io/ramalama/gemma3-270m:gguf-image,target=/artifact,ro=true \
  rlcr.io/ramalama/llamacpp-cpu-distroless:latest \
  --model /artifact/models/<exact-file>.gguf --host 0.0.0.0 --port 8080

Other Notes

If you’re ever stuck identifying any information about RamaLama models or images you can inspect the label attached to our artifacts. This includes information about

Model provenance
Model filename / location
Runtime build information
and much more

All of this metadata is attached under the com.ramalama namespace and can be inspected using any of the most common image tools including docker, podman, and oras. For example, you can find the model file name under com.ramalama.model.file.name by

docker image inspect rlcr.io/ramalama/gemma3-270m:latest \
  --format '{{index .Config.Labels "com.ramalama.model.file.location"}}/{{index .Config.Labels "com.ramalama.model.file.name"}}'

Getting Started

Quick Start

Deploying

Artifact Types

Education

Production Deployments: runtime + model volume

Podman: Image-as-volume

Other Notes

Getting Started

Quick Start

Deploying

Artifact Types

Education

​Production Deployments: runtime + model volume

​Podman: Image-as-volume

​Other Notes

Production Deployments: runtime + model volume

Podman: Image-as-volume

Other Notes