> ## Documentation Index
> Fetch the complete documentation index at: https://docs.ramalama.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Introduction

> Overview of the RamaLama Python SDK.

<div style={{textAlign: "center"}}>
  <img src="https://mintcdn.com/ramalamalabs/jjbjc4PjbjSxjyIk/assets/logo.webp?fit=max&auto=format&n=jjbjc4PjbjSxjyIk&q=85&s=4728d9e460f317bdded7dd80ce1aa7d9" alt="RamaLama Labs Logo" style={{width: "120px", margin: "0 auto", display: "block"}} width="291" height="600" data-path="assets/logo.webp" />
</div>

The RamaLama Python SDK wraps the RamaLama CLI to provision and run local models from your apps.

Use it when you want local-first inference with the same container-based model provisioning as the CLI.

## Overview

The Python SDK provides a local-first developer experience for running AI models on device. It wraps the RamaLama CLI to provision models in containers and exposes a simple API for inference in your apps.

Core capabilities include:

* LLM: local chat with OpenAI-compatible HTTP endpoints for direct requests.
* STT: speech-to-text with Whisper models running on device.

## Capabilities

<CardGroup cols={2}>
  <Card title="Chat" icon="message" href="/sdk/python/capabilities/chat">
    Send chat completion requests to a running model server.
  </Card>

  <Card title="Speech-to-Text" icon="microphone" href="/sdk/python/capabilities/speech-to-text">
    Local transcription with Whisper models (coming soon).
  </Card>
</CardGroup>

## Key Capabilities

* Container-native model provisioning with the RamaLama CLI.
* Flexible model sources (HuggingFace, Ollama, ModelScope, OCI registries, local files, URLs).
* Local-first inference to minimize latency and protect data.
* Model lifecycle control (download, serve, stop) from code.

## Core Philosophy

* On-device first
* Container-native by default
* Privacy-focused
* Developer-friendly APIs

## Features

### Language Models (LLM)

* Local chat with a simple SDK interface.
* OpenAI-compatible HTTP endpoint for direct requests.
* Bring-your-own model sources through the RamaLama CLI.

### Speech-to-Text (STT)

* Local transcription with Whisper models.
* Works entirely on device.

### Model Management

* Download and cache models locally.
* Start and stop model servers programmatically.
* Use the same model catalog and resolution as the CLI.

## System Requirements

| Requirement       | Notes                                |
| ----------------- | ------------------------------------ |
| RamaLama CLI      | Installed and available on your PATH |
| Container manager | Docker or Podman                     |
| Local storage     | Space for model downloads            |

## Next steps

* [Install the SDK](/sdk/python/installation)
* [Run the quick start](/sdk/python/quickstart)
* [Explore chat capabilities](/sdk/python/capabilities/chat)
