Skip to main content
The chat() method sends a chat completion request to a running model server and returns a ChatMessage payload. It is a simple API for quick prompts when you do not need to call the HTTP endpoint directly.

Context Manager

The SDK provides both a synchronous and asynchronous model API for chat.
from ramalama_sdk import RamalamaModel

with RamalamaModel(model="tinyllama") as model:
    response = model.chat("What is the capital of France?")
    print(response["content"])  # The capital of France is Paris.

Before you call chat()

The server must be running. If you are not using a context manager you’ll need to manage the models lifecycle yourself:
from ramalama_sdk import RamalamaModel

model = RamalamaModel(model="tinyllama")
model.download()
model.serve()

try:
    response = model.chat("Hello!")
    print(response["content"])
finally:
    model.stop()

Method signature

RamalamaModel.chat(message: str, history: list[ChatMessage] | None = None) -> ChatMessage

Parameters

ParameterTypeDescriptionDefault
messagestrUser prompt content.required
historylist[ChatMessage] or NoneOptional prior conversation messages.None

Returns

A ChatMessage typed dict with the assistant response.
FieldTypeDescription
roleLiteral[‘system’, ‘user’, ‘assistant’, ‘developer’]Message author role.
contentstrMessage text content.

Raises

  • RuntimeError if the server is not running.

Examples

Simple Q&A

answer = model.chat("What's 2 + 2?")
print(answer["content"])  # 4

Conversation flow

The SDK does not store history automatically. Pass it yourself:
history = []

def ask(question: str) -> str:
    history.append({"role": "user", "content": question})
    response = model.chat(question, history=history)
    history.append({"role": "assistant", "content": response["content"]})
    return response["content"]

reply1 = ask("My name is Alice")
reply2 = ask("What's my name?")

Error handling

try:
    response = model.chat("Hello!")
    print(response["content"])
except RuntimeError as exc:
    print(f"Server not running: {exc}")

When to use chat() vs direct HTTP

Use caseRecommended approach
Quick responseschat()
Custom payloads or full OpenAI schema controlDirect HTTP to /chat/completions
Interoperability with existing OpenAI clientsDirect HTTP to /chat/completions
For direct HTTP calls, see the quick start example that uses requests.