RamaLama - Run AI Models with Container Simplicity

Use familiar container commands to work with AI models. Pull, run, and serve models just like you would with Docker or Podman.

Automatically detects your GPU and pulls optimized container images for NVIDIA, AMD, Intel, Apple Silicon and more.

Run models in rootless containers with read-only mounts, network isolation, and automatic cleanup of temporary data.

Quick Start

Install RamaLama and start running AI models in minutes:

# Install via script (Linux/macOS)
curl -fsSL https://ramalama.ai/install.sh | bash

# Run your first model
ramalama run granite3-moe

🤖

Run models from HuggingFace, ModelScope, Ollama, and OCI registries. Supports popular formats like GGUF and more.

💬

Interact with models through a REST API or use the built-in chat interface. Perfect for both application development and direct interaction.

📚

Built-in support for Retrieval Augmented Generation (RAG). Convert your documents into vector databases and enhance model responses with your data.

🖥️

Works on Linux, macOS, and Windows (via WSL2). Supports both Podman and Docker as container engines.

📊

Built-in tools to benchmark and measure model performance. Calculate perplexity and compare different models.

👥

Join our active Matrix community for support and discussions. Open source and welcoming contributions.