> ## Documentation Index
> Fetch the complete documentation index at: https://docs.ramalama.com/llms.txt
> Use this file to discover all available pages before exploring further.

# cann

> Platform-specific setup guide

# Setting Up RamaLama with Ascend NPU Support on Linux systems

This guide walks through the steps required to set up RamaLama with Ascend NPU support.

* [Background](#background)
* [Hardware](#hardware)
* [Model](#model)
* [Docker](#docker)

## Background

**Ascend NPU** is a range of AI processors using Neural Processing Unit. It will efficiently handle matrix-matrix multiplication, dot-product and scalars.

**CANN** (Compute Architecture for Neural Networks) is a heterogeneous computing architecture for AI scenarios, providing support for multiple AI frameworks on the top and serving AI processors and programming at the bottom. It plays a crucial role in bridging the gap between upper and lower layers, and is a key platform for improving the computing efficiency of Ascend AI processors. Meanwhile, it offers a highly efficient and easy-to-use programming interface for diverse application scenarios, allowing users to rapidly build AI applications and services based on the Ascend platform.

## Hardware

### Ascend NPU

**Verified devices**

Table Supported Hardware List:

| Ascend NPU                     | Status  |
| ------------------------------ | ------- |
| Atlas A2 Training series       | Support |
| Atlas 800I A2 Inference series | Support |

*Notes:*

* If you have trouble with Ascend NPU device, please create an issue with **\[CANN]** prefix/tag.
* If you are running successfully with an Ascend NPU device, please help update the "Supported Hardware List" table above.

## Model

Currently, Ascend NPU acceleration is only supported when the llama.cpp backend is selected. For supported models, please refer to the page [llama.cpp/backend/CANN.md](https://github.com/ggml-org/llama.cpp/blob/master/docs/backend/CANN.md).

## Docker

### Install the Ascend driver

This provides NPU acceleration using the AI cores of your Ascend NPU. And [CANN](https://www.hiascend.com/en/software/cann) is a hierarchical APIs to help you to quickly build AI applications and service based on Ascend NPU.

For more information about Ascend NPU in [Ascend Community](https://www.hiascend.com/en/).

Make sure to have the CANN toolkit installed. You can download it from here: [CANN Toolkit](https://www.hiascend.com/developer/download/community/result?module=cann)
Make sure the Ascend Docker runtime is installed. You can download it from here: [Ascend-docker-runtime](https://www.hiascend.com/document/detail/en/mindx-dl/300/dluserguide/clusterscheduling/dlug_installation_02_000025.html)

### Build Images

Go to `ramalama` directory and build using make.

```bash theme={"system"}
make build IMAGE=cann
make install
```

You can test with:

```bash theme={"system"}
export ASCEND_VISIBLE_DEVICES=0
ramalama --image quay.io/ramalama/cann:latest serve -d -p 8080 -name ollama://smollm:135m
```

In a window see the running podman container.

```bash theme={"system"}
$ podman ps
CONTAINER ID   IMAGE                                                         COMMAND                  CREATED             STATUS             PORTS                                          NAMES
80fc31c131b0   quay.io/ramalama/cann:latest                                  "/bin/bash -c 'expor…"   About an hour ago   Up About an hour                                                  ame
```

Other using guides see RamaLama ([README.md](https://github.com/containers/ramalama/blob/main/README.md))

***

*Mar 2025, Originally compiled*