Run a model
Context Managers
The context manager will automatically manage and clean up running models on your behalf.Manual Management
It’s also possible to manually manage the models run state.Manual lifecycle
Download models
Usedownload() to fetch and cache models before serving.
The model identifier controls where the SDK pulls from.
Common prefixes include
- HuggingFace:
hf:// - Ollama:
ollama:// - OCI (any oci image repository):
oci:// - ModelScope:
modelscope:// - File:
file://
Instantiating a model
You can pass runtime overrides when creating a model session:| Parameter | Type | Description | Default |
|---|---|---|---|
| model | str | Model name or identifier. | required |
| base_image | str or None | Container image to use for serving, if different from config. | None |
| temp | float or None | Temperature override for sampling. | None |
| ngl | int or None | GPU layers override. | None |
| max_tokens | int or None | Maximum tokens for completions. | None |
| threads | int or None | CPU threads override. | None |
| ctx_size | int or None | Context window override. | None |
| timeout | int | Seconds to wait for server readiness. | 30 |

