RamaLama runtime images are minimal, security‑hardened containers that package an inference engine without any model files. Use them when you want to manage models separately (versioning, provenance, air‑gapped environments) or need fine‑grained control over mounts and updates.Documentation Index
Fetch the complete documentation index at: https://docs.ramalama.com/llms.txt
Use this file to discover all available pages before exploring further.
When to use runtimes
- Isolate the execution environment from model content for stricter change control
- Update model files without rebuilding container images
- Pin/roll back runtime versions independently of models
- Support multiple models on the same host via mounts
Supported flavors
Common runtime images include:rlcr.io/ramalama/llamacpp-cpu-distroless:latest— CPU‑onlyrlcr.io/ramalama/llamacpp-cuda-distroless:latest— NVIDIA CUDA- Requires NVIDIA Container Toolkit when using Docker
Run with a local model directory
Mount a directory containing your.gguf model and point the runtime to the file with --model.
Compose example
Define the runtime service and mount your model directory read‑only at/models.
docker-compose.yaml (CPU)
docker-compose.yaml (CUDA)
RamaLama CLI (override image)
The CLI auto‑detects your hardware and chooses an image, but you can override it explicitly:Next steps
- See deployment patterns:
/pages/deploying/compose - Learn about OCI‑packaged models:
/pages/artifacts/model

