> ## Documentation Index
> Fetch the complete documentation index at: https://docs.ramalama.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Docker Compose

> Run RamaLama in Docker Compose with CPU or GPU.

Use Docker Compose to run RamaLama with either turnkey model images or a base runtime with a model mounted as a volume.
All examples expose an OpenAI‑compatible API on port 8080 by default.

## Production Deployments: runtime + model volume

Following this strategy you will deploy an isolated and hardened runtime image while mounting your desired models into the `/models` directory of the containers.
This isolation allows you finer granularity in managing the lifecycle and deployment of your application.

<Steps>
  <Step title="Install ORAS (optional)">
    If you use RamaLama’s OCI‑packaged models, install a tool like ORAS to pull them locally. You can also use models from other providers (HuggingFace, Ollama, etc.).

    <CodeGroup>
      ```bash title="macOS" theme={"system"}
      brew install oras
      ```

      ```bash title="Linux" theme={"system"}
      VERSION=1.3.0 # see https://github.com/oras-project/oras/releases for the latest
      OS=linux
      ARCH=$(uname -m); case "$ARCH" in x86_64) ARCH=amd64;; aarch64|arm64) ARCH=arm64;; esac
      curl -sSLo /tmp/oras.tgz \
        https://github.com/oras-project/oras/releases/download/v${VERSION}/oras_${VERSION}_${OS}_${ARCH}.tar.gz
      sudo tar -C /usr/local/bin -xzf /tmp/oras.tgz oras
      oras version
      ```
    </CodeGroup>
  </Step>

  <Step title="Download the model to the node">
    With ORAS you can extract our models directly to your desired directory.

    ```bash theme={"system"}
    oras pull rlcr.io/ramalama/gemma3-270m:gguf -o ./models/
    ```
  </Step>

  <Step title="Create docker-compose.yaml">
    Define the runtime service and mount your model directory read‑only at `/models`.

    <CodeGroup>
      ```yaml title="CPU" theme={"system"}
      services:
        llama:
          image: rlcr.io/ramalama/llamacpp-cpu-distroless:latest
          command: ["llama-server", "--model", "/models/gemma-3-270m-it-Q6_K.gguf", "--host", "0.0.0.0", "--port", "8080"]
          volumes:
            - ./models:/models:ro  # bind mount containing your .gguf
          ports:
            - "8080:8080"
          restart: unless-stopped
      ```

      ```yaml title="GPU" theme={"system"}
      services:
        llama-gpu:
          image: rlcr.io/ramalama/llamacpp-cuda-distroless:latest
          command: ["--model", "/models/gemma-3-270m-it-Q6_K.gguf", "--host", "0.0.0.0", "--port", "8080"]
          volumes:
            - ./models:/models:ro
          ports:
            - "8080:8080"
          gpus: all            # requires NVIDIA Container Toolkit
          restart: unless-stopped
      ```
    </CodeGroup>
  </Step>

  <Step title="Start the stack">
    <CodeGroup>
      ```bash title="Docker" theme={"system"}
      docker compose up -d
      ```

      ```bash title="Podman" theme={"system"}
      podman compose up -d
      ```
    </CodeGroup>
  </Step>
</Steps>

<Tip>
  No Compose? You can run directly with Docker or Podman using the same volume mount.

  <CodeGroup>
    ```bash title="Docker" theme={"system"}
    docker run --rm -p 8080:8080 \
      -v "$PWD/models/gemma-3-1b-it:/models:ro" \
      rlcr.io/ramalama/llamacpp-cpu-distroless:latest \
      --model /models/gemma-3-1b-it-Q6_K.gguf --host 0.0.0.0 --port 8080
    ```

    ```bash title="Podman" theme={"system"}
    podman run --rm -p 8080:8080 \
      -v "$PWD/models/gemma-3-1b-it:/models:ro" \
      rlcr.io/ramalama/llamacpp-cpu-distroless:latest \
      --model /models/gemma-3-1b-it-Q6_K.gguf --host 0.0.0.0 --port 8080
    ```
  </CodeGroup>
</Tip>

### Podman: Image-as-volume

For podman users you can also mount a container image directly as a read‑only volume allowing us to bypass the need for a local models directory.
We build mountable artifacts using the `:<file_type>-image` like `:gguf-image` tag structure.

<CodeGroup>
  ```bash title="CPU" theme={"system"}
  podman run --rm -p 8080:8080 \
    --mount type=image,src=rlcr.io/ramalama/gemma3-270m:gguf-image,target=/artifact,ro=true \
    rlcr.io/ramalama/llamacpp-cpu-distroless:latest \
    --model /artifact/models/<exact-file>.gguf --host 0.0.0.0 --port 8080
  ```

  ```bash title="GPU" theme={"system"}
  podman run --rm -p 8080:8080 \
    --mount type=image,src=rlcr.io/ramalama/gemma3-270m:gguf-image,target=/artifact,ro=true \
    --gpus all \
    rlcr.io/ramalama/llamacpp-cuda-distroless:latest \
    --model /artifact/models/<exact-file>.gguf --host 0.0.0.0 --port 8080
  ```
</CodeGroup>

## Other Notes

If you're ever stuck identifying any information about RamaLama models or images you can inspect the label attached to our artifacts.
This includes information about

1. Model provenance
2. Model filename / location
3. Runtime build information
4. and much more

All of this metadata is attached under the `com.ramalama` namespace and can be inspected using any of the most common image tools including docker, podman, and oras.
For example, you can find the model file name under `com.ramalama.model.file.name` by

<CodeGroup>
  ```bash title="docker" theme={"system"}
  docker image inspect rlcr.io/ramalama/gemma3-270m:latest \
    --format '{{index .Config.Labels "com.ramalama.model.file.location"}}/{{index .Config.Labels "com.ramalama.model.file.name"}}'
  ```

  ```bash title="podman" theme={"system"}
  podman image inspect rlcr.io/ramalama/gemma3-270m:latest \
    --format '{{index .Config.Labels "com.ramalama.model.file.location"}}/{{index .Config.Labels "com.ramalama.model.file.name"}}'
  ```

  ```bash title="oras" theme={"system"}
  oras manifest fetch rlcr.io/ramalama/gemma3-270m:gguf \
    | jq -r '(.annotations["com.ramalama.model.file.location"] // "") + "/" + (.annotations["com.ramalama.model.file.name"] // "")'
  ```
</CodeGroup>
