Introduction

The RamaLama Python SDK wraps the RamaLama CLI to provision and run local models from your apps. Use it when you want local-first inference with the same container-based model provisioning as the CLI.

Overview

The Python SDK provides a local-first developer experience for running AI models on device. It wraps the RamaLama CLI to provision models in containers and exposes a simple API for inference in your apps. Core capabilities include:

LLM: local chat with OpenAI-compatible HTTP endpoints for direct requests.
STT: speech-to-text with Whisper models running on device.

Capabilities

Chat

Send chat completion requests to a running model server.

Speech-to-Text

Local transcription with Whisper models (coming soon).

Key Capabilities

Container-native model provisioning with the RamaLama CLI.
Flexible model sources (HuggingFace, Ollama, ModelScope, OCI registries, local files, URLs).
Local-first inference to minimize latency and protect data.
Model lifecycle control (download, serve, stop) from code.

Core Philosophy

On-device first
Container-native by default
Privacy-focused
Developer-friendly APIs

Features

Language Models (LLM)

Local chat with a simple SDK interface.
OpenAI-compatible HTTP endpoint for direct requests.
Bring-your-own model sources through the RamaLama CLI.

Speech-to-Text (STT)

Local transcription with Whisper models.
Works entirely on device.

Model Management

Download and cache models locally.
Start and stop model servers programmatically.
Use the same model catalog and resolution as the CLI.

System Requirements

Requirement	Notes
RamaLama CLI	Installed and available on your PATH
Container manager	Docker or Podman
Local storage	Space for model downloads

Getting Started

Python

Planned SDKs

Overview

Capabilities

Chat

Speech-to-Text

Key Capabilities

Core Philosophy

Features

Language Models (LLM)

Speech-to-Text (STT)

Model Management

System Requirements

Next steps

Getting Started

Python

Planned SDKs

​Overview

​Capabilities

Chat

Speech-to-Text

​Key Capabilities

​Core Philosophy

​Features

​Language Models (LLM)

​Speech-to-Text (STT)

​Model Management

​System Requirements

​Next steps

Overview

Capabilities

Key Capabilities

Core Philosophy

Features

Language Models (LLM)

Speech-to-Text (STT)

Model Management

System Requirements

Next steps