> ## Documentation Index
> Fetch the complete documentation index at: https://docs.ramalama.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Model Images

> Turnkey container images that bundle a runtime and a specific model — the fastest path to serving.

Model images package both an inference runtime (e.g., llama.cpp) and a specific model into a single container image.
They’re ideal for quick starts, demos, single‑purpose services, and environments where simplicity is preferred over component isolation.

## When to use model images

* Fastest way to get an endpoint running
* Minimal choices: no need to choose a runtime or mount model files
* Great for laptops, POCs, and small dedicated services

If you need stronger isolation or to manage model files independently, see `/pages/artifacts/runtime` and `/pages/artifacts/model`.

## Quick start

<CodeGroup>
  ```bash title="Docker" theme={"system"}
  docker pull rlcr.io/ramalama/gemma3-270m:latest
  docker run --rm -p 8080:8080 rlcr.io/ramalama/gemma3-270m:latest
  ```

  ```bash title="Podman" theme={"system"}
  podman pull rlcr.io/ramalama/gemma3-270m:latest
  podman run --rm -p 8080:8080 rlcr.io/ramalama/gemma3-270m:latest
  ```

  ```bash title="RamaLama CLI" theme={"system"}
  ramalama serve --image rlcr.io/ramalama/llamacpp-cpu-distroless rlcr://gemma3-270m
  ```
</CodeGroup>

Test the OpenAI‑compatible API:

<CodeGroup>
  ```bash title="curl" theme={"system"}
  curl -s http://localhost:8080/v1/chat/completions \
    -H 'Content-Type: application/json' \
    -d '{"model":"gemma3-270m","messages":[{"role":"user","content":"Say hello in one sentence"}]}'
  ```

  ```bash title="RamaLama CLI" theme={"system"}
  ramalama chat "Say hello in one sentence"
  ```
</CodeGroup>

<Tip>
  You can find the full catalogue of RamaLama Labs images [here](https://registry.ramalama.com/projects/ramalama)
</Tip>

## Compose

```yaml title="docker-compose.yaml" theme={"system"}
services:
  ai:
    image: rlcr.io/ramalama/gemma3-270m:latest
    ports:
      - "8080:8080"
    restart: unless-stopped
```

## Notes on updates, tags, and hardware

* Examples use `:latest`; pin tags in production for repeatability
* Images are rebuilt and scanned regularly for security and performance
* Hardware acceleration is chosen by the underlying image; for advanced control, use runtimes directly

## See also

* Manage models separately: `/pages/artifacts/model`
* Engines only (mount a model): `/pages/artifacts/runtime`
