> ## Documentation Index
> Fetch the complete documentation index at: https://docs.ramalama.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Local Environment

> Configure your machine for running RamaLama locally.

RamaLama allows you to run AI workloads on your laptop just as easily as you run them in the cloud.
The CLI can help whether you're running a coding agent locally or developing a reproducible local environments that matches production.

## Prerequisites

1. [Podman](https://podman.io/docs/installation) or [Docker](https://docs.docker.com/get-docker/) installed (recommended)
2. [RamaLama](/pages/getting_started/oss) installed (`pip install ramalama` or `dnf install python3-ramalama`)
3. **Optional**: GPU drivers/runtime (NVIDIA Container Toolkit, AMD ROCm, etc.)
   Check your install:

## Serve a model locally

Start a REST API on port 8080 in the background:

```bash theme={"system"}
ramalama serve --image rlcr.io/ramalama/llamacpp-cpu-distroless -d -p 8080 rlcr://gemma3-270m
```

Interact via the OpenAI-compatible API:

<CodeGroup>
  ```bash title="RamaLama" theme={"system"}
  ramalama chat "Say hello in one sentence"
  ```

  ```bash title="curl" theme={"system"}
  curl -s http://localhost:8080/v1/chat/completions \
    -H 'Content-Type: application/json' \
    -d '{
          "model": "gemma3-270m",
          "messages": [
            {"role": "user", "content": "Say hello in one sentence"}
          ]
        }'
  ```

  <Tip>
    You can find the full catalogue of RamaLama Labs images [here](https://registry.ramalama.com/projects/ramalama)
  </Tip>
</CodeGroup>

<CodeGroup>
  ```text title="RamaLama" theme={"system"}
  Hello!
  ```

  ```json title="curl" theme={"system"}
  {
      "id":"chatcmpl-ZYtHxmjGSdIHs7tqMlA6eS9NhctuDZ6Y",
      "model":"gemma3-270m",
      "object":"chat.completion",
      "choices":[{"finish_reason":"stop","index":0,"message":{"role":"assistant","content":"Hello! "}}]
  }
  ```
</CodeGroup>

List and stop containers:

```bash theme={"system"}
ramalama containers
ramalama stop --all
```

## GPU acceleration

RamaLama detects your hardware and picks an accelerated image automatically (`quay.io/ramalama/cuda`, `rocm`, `intel-gpu`, etc.). To override, specify `--image`:

```bash theme={"system"}
ramalama serve -d -p 8080 --image --image rlcr.io/ramalama/llamacpp-cpu-distroless llama3
```

If you use Docker with NVIDIA GPUs, ensure the NVIDIA Container Toolkit is installed and your compose/run commands have GPU access enabled as needed.

## Data and storage

Models are stored under your user data directory (e.g., `~/.local/share/ramalama`).
Use `ramalama list` to see downloaded models and `ramalama rm` to remove them.

## Security defaults

RamaLama runs models in rootless containers with `--network=none`, read-only model mounts, and `--rm` cleanup.

## Next Steps

<CardGroup cols={2}>
  <Card title="Docker Compose" icon="docker" href="/pages/deploying/compose">
    Deploy multi-container AI workloads with Docker Compose
  </Card>

  <Card title="Kubernetes" icon="dharmachakra" href="/pages/deploying/kubernetes">
    Scale your AI deployments on Kubernetes clusters
  </Card>
</CardGroup>