Model images package both an inference runtime (e.g., llama.cpp) and a specific model into a single container image. They’re ideal for quick starts, demos, single‑purpose services, and environments where simplicity is preferred over component isolation.Documentation Index
Fetch the complete documentation index at: https://docs.ramalama.com/llms.txt
Use this file to discover all available pages before exploring further.
When to use model images
- Fastest way to get an endpoint running
- Minimal choices: no need to choose a runtime or mount model files
- Great for laptops, POCs, and small dedicated services
/pages/artifacts/runtime and /pages/artifacts/model.
Quick start
Compose
docker-compose.yaml
Notes on updates, tags, and hardware
- Examples use
:latest; pin tags in production for repeatability - Images are rebuilt and scanned regularly for security and performance
- Hardware acceleration is chosen by the underlying image; for advanced control, use runtimes directly
See also
- Manage models separately:
/pages/artifacts/model - Engines only (mount a model):
/pages/artifacts/runtime

