RamaLama CLI is a local first developer toolkit that treats AI models like container images — you can pull, run, and serve them with familiar container‑centric workflows. It automatically identifies your hardware and automatically selects an appropriate runtime image for your hardware configuration.Documentation Index
Fetch the complete documentation index at: https://docs.ramalama.com/llms.txt
Use this file to discover all available pages before exploring further.
The RamaLama CLI is open-source and open to contributors.
Check the project out at https://github.com/containers/ramalama
Installation
Functionality
The CLI includes a variety of useful functions including- Local serving and interaction with AI models
- Packaging containerized AI deployments
- Building optimized deployments for RAG workloads
- etc…
Serve a REST API
RamaLama makes it easy to work with AI on your laptop. You can deploy an OpenAI compatible API with a single command.Model Repositories
RamaLama can serve models from any of the major model providers including RamaLama Labs, HuggingFace, Ollama, and Modelscope. Additionally, it supports generic oci model artifacts meaning you can easily run and serve models from your own or your enterprises own model registry. For example, you can easily serve an oci compatible artifact from Dockers modelhub withHardware acceleration
RamaLama inspects your system and chooses a matching runtime image (e.g., CUDA, ROCm, Intel GPU, CPU). However, you can override the default image explicitly with the —image command and runNext steps
Deploy to Production
Learn how to deploy with Docker Compose or Kubernetes
Explore on GitHub
Browse the full documentation, examples, and man pages

