Skip to main content
RamaLama Logo

Run AI models locally with the simplicity of containers

Simple & Familiar

Use familiar container commands to work with AI models. Pull, run, and serve models just like you would with Docker or Podman.

Hardware Optimized

Automatically detects your GPU and pulls optimized container images for NVIDIA, AMD, Intel, Apple Silicon and more.

Secure by Default

Run models in rootless containers with read-only mounts, network isolation, and automatic cleanup of temporary data.

Quick Start

Install RamaLama and start running AI models in minutes:

# Install via script (Linux/macOS)
curl -fsSL https://ramalama.ai/install.sh | bash

# Run your first model
ramalama run granite3-moe

Supported Registries

  • HuggingFace
  • ModelScope
  • Ollama
  • OCI Container Registries (Quay.io, Docker Hub, etc.)
🤖

Multiple Model Support

Run models from HuggingFace, ModelScope, Ollama, and OCI registries. Supports popular formats like GGUF and more.

💬

REST API & Chat Interface

Interact with models through a REST API or use the built-in chat interface. Perfect for both application development and direct interaction.

📚

RAG Support

Built-in support for Retrieval Augmented Generation (RAG). Convert your documents into vector databases and enhance model responses with your data.

🖥️

Cross-Platform

Works on Linux, macOS, and Windows (via WSL2). Supports both Podman and Docker as container engines.

📊

Performance Benchmarking

Built-in tools to benchmark and measure model performance. Calculate perplexity and compare different models.

👥

Active Community

Join our active Matrix community for support and discussions. Open source and welcoming contributions.