Skip to main content
Use Docker Compose to run RamaLama with either turnkey model images or a base runtime with a model mounted as a volume. All examples expose an OpenAI‑compatible API on port 8080 by default.

Production Deployments: runtime + model volume

Following this strategy you will deploy an isolated and hardened runtime image while mounting your desired models into the /models directory of the containers. This isolation allows you finer granularity in managing the lifecycle and deployment of your application.
1

Install ORAS (optional)

If you use RamaLama’s OCI‑packaged models, install a tool like ORAS to pull them locally. You can also use models from other providers (HuggingFace, Ollama, etc.).
brew install oras
2

Download the model to the node

With ORAS you can extract our models directly to your desired directory.
oras pull rlcr.io/ramalama/gemma3-270m:gguf -o ./models/
3

Create docker-compose.yaml

Define the runtime service and mount your model directory read‑only at /models.
services:
  llama:
    image: rlcr.io/ramalama/llamacpp-cpu-distroless:latest
    command: ["llama-server", "--model", "/models/gemma-3-270m-it-Q6_K.gguf", "--host", "0.0.0.0", "--port", "8080"]
    volumes:
      - ./models:/models:ro  # bind mount containing your .gguf
    ports:
      - "8080:8080"
    restart: unless-stopped
4

Start the stack

docker compose up -d
No Compose? You can run directly with Docker or Podman using the same volume mount.
docker run --rm -p 8080:8080 \
  -v "$PWD/models/gemma-3-1b-it:/models:ro" \
  rlcr.io/ramalama/llamacpp-cpu-distroless:latest \
  --model /models/gemma-3-1b-it-Q6_K.gguf --host 0.0.0.0 --port 8080

Podman: Image-as-volume

For podman users you can also mount a container image directly as a read‑only volume allowing us to bypass the need for a local models directory. We build mountable artifacts using the :<file_type>-image like :gguf-image tag structure.
podman run --rm -p 8080:8080 \
  --mount type=image,src=rlcr.io/ramalama/gemma3-270m:gguf-image,target=/artifact,ro=true \
  rlcr.io/ramalama/llamacpp-cpu-distroless:latest \
  --model /artifact/models/<exact-file>.gguf --host 0.0.0.0 --port 8080

Other Notes

If you’re ever stuck identifying any information about RamaLama models or images you can inspect the label attached to our artifacts. This includes information about
  1. Model provenance
  2. Model filename / location
  3. Runtime build information
  4. and much more
All of this metadata is attached under the com.ramalama namespace and can be inspected using any of the most common image tools including docker, podman, and oras. For example, you can find the model file name under com.ramalama.model.file.name by
docker image inspect rlcr.io/ramalama/gemma3-270m:latest \
  --format '{{index .Config.Labels "com.ramalama.model.file.location"}}/{{index .Config.Labels "com.ramalama.model.file.name"}}'