Our containerized AI artifacts are OCI compatible allowing you to directly use them with docker, podman and kubernetes wherever you need them: whether the cloud, a datacenter, or your basement.
Our artifacts are regularly rebuilt, updated, and scanned for vulnerabilities to provide, the smallest, fastest, and most secure runtime possible.
You can find comparisons between different images on the comparisons page of each image
(e.g. for llama.cpp’s cuda and cpu) runtimes.
Quick start
Install dependencies
Getting started requires either Docker or Podman. We also recommend the RamaLama CLI for a streamlined experience.
- Install Podman or Docker
- (Optional) Install RamaLama CLI
Run the model
Model images bundle both the runtime and model, providing a single runnable container.ramalama serve --image rlcr.io/ramalama/llamacpp-cpu-distroless rlcr://gemma3-270m:latest
You can find the full catalogue of RamaLama Labs images here Get chatting
The endpoint is OpenAI‑compatible. Try a quick chat request:ramalama chat "Say hello in one sentence"
Many of our images come bundled with a web server GUI. If you’d prefer to chat directly with the agent you can access it at the root url where
the agent is being served (e.g. http://localhost:8080)
Next steps