Setup and Architecture Explanation#
Tooling#
Before starting our programming we first want to have an idea of the architecture we’re targetting and what we want to happen internally. The AMD AI Engine has a unique architecture in which different levels of parallelism that can be utilized. To show how one can bring their own kernels into the architecture, we will first explain the different components of the AI Engine. Then we will show an example for a simple AIE vector addition and how the different but similar code synthesizes onto the actual target architecture.
The AI Engine suite of tools includes a compiler and both a software and hardware emulator. Vitis itself includes features to trace signals, generate dataflow graphs, show data movement over hardware and cycle time, profile the kernel and io, and much more. While the process of compiling and emulation can be done from command line tools, to use more advanced analysis will require Vitis IDE. All of these tools are freely available for download by AMD. You will also need an AMD AIE build license.
Background#
The AI Engine is a technology developed by Xilinx and then adopted by AMD after a Xilinx acquisition, finalized on Valentines Day 2022 (February 14th). [1] [2] It’s successor, the AI Engine-ML (AIE-ML) makes a few changes to the architecture and composes AMD’s NPU. This new architecture is referred to as XDNA. [3]
The AMD AI Engine has gone through a few iterations that change the support and architecture. As of May 2025, there are currently 3 iterations of the AIE. Confusingly, because the term AIE refers to a technology and the first generation of AIE, sometimes the AIE-ML or onwards is also referred to as an AIE. For the sake of clarity, we will refer to the AIE technology as the AIE, and will refer to specific versions as:
AIE1: First generation of the AIE. For clarity we refer it to AIE1, and not simply as AIE.
AIE-ML: Addition of an AIE Memory Tile. These make up XDNA.
AIE-MLv2: Second generation of AIE-ML. These make up XDNA2.
Note: the AIE and the AIE-ML have been referred to as AIE-1 and AIE-2. But please do not use these terms.
The AIE is integrated in AMD’s CPUs (referred to as the Ryzen AI Series) and certain AMD Versal FPGA SoC. It has kept its name through the Xilinx acquisition, as Versal was originally given and planned by Xilinx way back in 2018. [8] Because only certain Versal SoCs include the AIE, we include a small list of their products here. We also make an important make an important distinction that AMD’s Alveo Accelerator cards are not an addition to this list, as the Alveo V80 and Alveo V70 are Versal SoCs, hence the V in the naming convention. [5] And that the VCK5000 is another outlier.
First Generation Versal SoC Series Table
Series |
Products |
AIE-Type |
Number |
Notes |
|---|---|---|---|---|
Versal Premium |
VPK120 VPK180 |
N/A |
0 |
Eval Kit |
VP2502 |
AIE1 |
472 |
||
VP2802 |
||||
VP1002 VP1052 VP1102 VP1202 VP1402 VP1502 VP1552 VP1702 VP1802 VP1902 |
N/A |
0 |
||
Versal AI Edge |
VEK280 |
AIE-ML |
304 |
Eval Kit |
VE2002 |
AIE-ML |
8 |
||
VE2102 |
12 |
|||
VE2202 |
24 |
|||
VE2302 |
34 |
|||
VE2602 |
152 |
|||
VE2802 |
304 |
|||
VE1752 |
AIE1 |
304 |
||
Versal RF |
VR1602 |
AIE1 |
126 |
|
VR1652 |
||||
VR1902 |
120 |
|||
VR1952 |
||||
Versal AI Core |
VCK190 |
AIE1 |
400 |
Eval Kit |
VC1502 |
AIE1 |
198 |
||
VC1702 |
AIE1 |
304 |
||
VC1802 |
AIE1 |
300 |
||
VC1902 |
AIE1 |
400 |
||
VC2602 |
AIE-ML |
152 |
||
VC2802 |
AIE-ML |
304 |
||
??? Alveo v70 |
AIE-MLv2 |
0 |
|
|
VC1902 |
AIE1 |
400 |
VCK5000 (Eval Board) [10] |
|
Versal Prime |
VMK180 |
N/A |
0 |
Eval kit |
VM1102 VM1302 VM1402 VM1502 VM1802 VM2152 VM2202 VM2302 VM2502 VM2902 |
N/A |
0 |
||
Versal HBM |
VHK158 |
N/A |
0 |
Eval kit |
VM1102 VM1302 VM1402 VM1502 VM1802 VM2152 |
N/A |
0 |
||
XCV80 |
N/A |
0 |
Alveo V80 |
Discontinuation notice of FPGA boards. [6]
Second Generation Versal SoC Series Table
Series |
Products |
AIE-Type |
Number |
Notes |
|---|---|---|---|---|
Versal Premium Gen 2 |
2VP3102 2VP3202 2VP3402 2VP3602 |
N/A |
0 |
|
AI Edge Gen 2 |
2VE3304 2VE3358 |
AIE-MLv2 |
24 |
“The Versal AI Edge Series Gen 2 is currently in Early Access.” [v-edge-gen2] |
2VE3504 |
96 |
|||
2VE3558 |
||||
2VE3804 |
144 |
|||
2VE3858 |
||||
Versal Prime Gen 2 |
2VM3358 2VM3558 2VM3654 2VM3858 |
N/A |
0 |
Again, other Alveo cards are not included since they are not Versal.
A much more reliable method to inspect the number and generation of AIE is to open up Vitis, select your target, and open an Array View. Considering the high number of SoC variations and different boards there is likelihood of a mistake or of a missing entry.
CPUs are much easier. AIE in this context is always an NPU, from either architecture XDNA or XDNA2. These different architectures use either AIE-ML or AIE-MLv2 respetively.
AMD makes this list of AMD Ryzen AI CPUs so much easier to find. The generations to have NPUs are Ryzen 7040 (“Phoenix”), Ryzen 8040 (“Hawk Point”), and Ryzen 300 (“Strix Point”). [12] . Additionally, there are also mentions of Krackan Point, a mobile APU [11] .You can view a complete list of processors that have an NPU at AMD Ryzen AI.
Series |
AIE-Type |
Number |
Notes |
|---|---|---|---|
Ryzen AI 7040 (Phoenix) |
AIE-ML |
16 |
NPU XDNA Arch. |
(Hawk Point) |
|||
Ryzen AI 300 (Strix Point) |
AIE-MLv2 |
32 |
(Including Ryzen AI Max) NPU XDNA2 Arch. |
Architecture Differences#
Since the naming scheme does little distinguate the differences accross versions, we’ve dedicated a section to explain the differences.
Feature |
AIE1 Tile |
AIE-ML Tile |
AIE-MLv2 Tile (Lack of Doc) |
|---|---|---|---|
Clock Speed [*]_ |
1GHz |
1 GHz |
?? |
Per Tile Data Memory |
32 KiB 4 Memory banks |
64 KiB + (125-750 KiB Shared) 8 Memory banks shared w/3 neighbors |
?? |
Int Type Support (bit) |
8, 16, 32 AIE Funct list |
4, 8, 16, 32 AIE-ML funct list |
?? |
Vector Int Type Support (bit) |
8, 16, 32 |
4, 8, 16 |
?? |
Floating Point Support |
32 |
16, 32 |
AIE-ML +BF16 |
Vector Floating Point Support |
32 |
None |
?? |
Interconnects |
Shared Memory Buffer. 2 input and 2 output Streaming Buffers |
Shared Memory Buffer. 1 input and 1 output Streaming Buffer |
|
AI Engine ML Memory Tile (Not part of the AIE Tile, but always included alongside) |
No |
Yes |
Yes |
Cascade Stream |
Horizontal Direction. Rows alternate directions. |
Two streams. One directed down and the other right. |
|
On XDNA |
4 rows, 5 columns |
4 rows, 8 columns |
Clock, voltage, frequency, power distribution, PL and NoC interface, debug/trace funcionality, are the same across AIE1 and AIE-ML. [18]
Single precision floating point is supported, but does not meet all specifications and may have rounding errors. [4] In fact, some operations types are done by conversions to another, and are not native. Therefore, despite support, it is important to see if the architecture supports such operations natively.
So far, XDNA2 only exists for customers in the Strix Point. And while details for the AIE-MLv2 are currently extremely limited as well, we put out small details here. [20] The scaling refers to the previous XDNA generation.
- XDNA2
50 INT8 TOPS and 50 Block FP16 TFLOPS
1.6x On-Chip Memory
Block floating point support
Better support non-linear function
50% weight sparsity
2x more concurrent spatial streams (Up to 8 concurrent Isolated Spatial Streams)
Per Column Power Gating
2x Perf/W