Laptop

Our containerized AI artifacts are OCI compatible allowing you to directly use them with docker, podman and kubernetes wherever you need them: whether the cloud, a datacenter, or your basement. Our artifacts are regularly rebuilt, updated, and scanned for vulnerabilities to provide, the smallest, fastest, and most secure runtime possible.

You can find comparisons between different images on the comparisons page of each image (e.g. for llama.cpp’s cuda and cpu) runtimes.

Quick start

Install dependencies

Getting started requires either Docker or Podman. We also recommend the RamaLama CLI for a streamlined experience.

Install Podman or Docker
(Optional) Install RamaLama CLI

Run the model

Model images bundle both the runtime and model, providing a single runnable container.

ramalama serve --image rlcr.io/ramalama/llamacpp-cpu-distroless rlcr://gemma3-270m:latest

You can find the full catalogue of RamaLama Labs images here

Get chatting

The endpoint is OpenAI‑compatible. Try a quick chat request:

ramalama chat "Say hello in one sentence"

Hello!

Many of our images come bundled with a web server GUI. If you’d prefer to chat directly with the agent you can access it at the root url where the agent is being served (e.g. http://localhost:8080)

Next steps

Deploy to Production

Deploy with Docker Compose or Kubernetes for production workloads

Security & Provenance

Review CVEs, SBOMs, and security best practices

Request Custom Images

Need bespoke images for specific hardware or compliance needs?

Getting Started

Quick Start

Deploying

Artifact Types

Education

Quick start

Next steps

Deploy to Production

Security & Provenance

Request Custom Images

Getting Started

Quick Start

Deploying

Artifact Types

Education

​Quick start

​Next steps

Deploy to Production

Security & Provenance

Request Custom Images

Quick start

Next steps