Prerequisites
- Podman or Docker installed (recommended)
- RamaLama installed (
pip install ramalamaordnf install python3-ramalama) - Optional: GPU drivers/runtime (NVIDIA Container Toolkit, AMD ROCm, etc.) Check your install:
Serve a model locally
Start a REST API on port 8080 in the background:GPU acceleration
RamaLama detects your hardware and picks an accelerated image automatically (quay.io/ramalama/cuda, rocm, intel-gpu, etc.). To override, specify --image:
Data and storage
Models are stored under your user data directory (e.g.,~/.local/share/ramalama).
Use ramalama list to see downloaded models and ramalama rm to remove them.
Security defaults
RamaLama runs models in rootless containers with--network=none, read-only model mounts, and --rm cleanup.
