Production Deployments: runtime + model volume
Following this strategy you will deploy an isolated and hardened runtime image while mounting your desired models into the/models directory of the containers.
This isolation allows you finer granularity in managing the lifecycle and deployment of your application.
1
Install ORAS (optional)
If you use RamaLama’s OCI‑packaged models, install a tool like ORAS to pull them locally. You can also use models from other providers (HuggingFace, Ollama, etc.).
2
Download the model to the node
With ORAS you can extract our models directly to your desired directory.
3
Create docker-compose.yaml
Define the runtime service and mount your model directory read‑only at
/models.4
Start the stack
Podman: Image-as-volume
For podman users you can also mount a container image directly as a read‑only volume allowing us to bypass the need for a local models directory. We build mountable artifacts using the:<file_type>-image like :gguf-image tag structure.
Other Notes
If you’re ever stuck identifying any information about RamaLama models or images you can inspect the label attached to our artifacts. This includes information about- Model provenance
- Model filename / location
- Runtime build information
- and much more
com.ramalama namespace and can be inspected using any of the most common image tools including docker, podman, and oras.
For example, you can find the model file name under com.ramalama.model.file.name by
