Simple & Familiar
Use familiar container commands to work with AI models. Pull, run, and serve models just like you would with Docker or Podman.
Hardware Optimized
Automatically detects your GPU and pulls optimized container images for NVIDIA, AMD, Intel, Apple Silicon and more.
Secure by Default
Run models in rootless containers with read-only mounts, network isolation, and automatic cleanup of temporary data.
Quick Start
Install RamaLama and start running AI models in minutes:
# Install via script (Linux/macOS)
curl -fsSL https://ramalama.ai/install.sh | bash
# Run your first model
ramalama run granite3-moe
Supported Registries
- HuggingFace
- ModelScope
- Ollama
- OCI Container Registries (Quay.io, Docker Hub, etc.)
Multiple Model Support
Run models from HuggingFace, ModelScope, Ollama, and OCI registries. Supports popular formats like GGUF and more.
REST API & Chat Interface
Interact with models through a REST API or use the built-in chat interface. Perfect for both application development and direct interaction.
RAG Support
Built-in support for Retrieval Augmented Generation (RAG). Convert your documents into vector databases and enhance model responses with your data.
Cross-Platform
Works on Linux, macOS, and Windows (via WSL2). Supports both Podman and Docker as container engines.
Performance Benchmarking
Built-in tools to benchmark and measure model performance. Calculate perplexity and compare different models.
Active Community
Join our active Matrix community for support and discussions. Open source and welcoming contributions.