
What is RamaLama?
RamaLama is an open-source container orchestration system for AI. With the SDKs, you can integrate local inference into your apps while keeping data on device and minimizing latency. Once models are downloaded, inference can run fully offline.Core AI Capabilities
Every RamaLama SDK provides access to these core AI features:LLM (Large Language Model)
On-device chat with an OpenAI-compatible HTTP endpoint for direct requests.STT (Speech-to-Text)
Local transcription with Whisper models running on device.Why RamaLama?
- Privacy by design
- Low latency
- Offline capable
- Container-native model provisioning
Supported SDKs
| Platform | Status | Installation | Documentation |
|---|---|---|---|
| Python | Active development | pip install ramalama-sdk | /sdk/python/introduction |
| TypeScript | Planned | Coming soon | /sdk/typescript |
| Go | Planned | Coming soon | /sdk/go |
| Rust | Planned | Coming soon | /sdk/rust |
Get Started
- Choose your SDK from the list above.
- Install the SDK for your platform.
- Initialize and build with the quick start guide.

