Setup and Architecture Explanation

Setup and Architecture Explanation#

Tooling#

Before starting our programming we first want to have an idea of the architecture we’re targetting and what we want to happen internally. The AMD AI Engine has a unique architecture in which different levels of parallelism that can be utilized. To show how one can bring their own kernels into the architecture, we will first explain the different components of the AI Engine. Then we will show an example for a simple AIE vector addition and how the different but similar code synthesizes onto the actual target architecture.

The AI Engine suite of tools includes a compiler and both a software and hardware emulator. Vitis itself includes features to trace signals, generate dataflow graphs, show data movement over hardware and cycle time, profile the kernel and io, and much more. While the process of compiling and emulation can be done from command line tools, to use more advanced analysis will require Vitis IDE. All of these tools are freely available for download by AMD. You will also need an AMD AIE build license.

Background#

The AI Engine is a technology developed by Xilinx and then adopted by AMD after a Xilinx acquisition, finalized on Valentines Day 2022 (February 14th). [1] [2] It’s successor, the AI Engine-ML (AIE-ML) makes a few changes to the architecture and composes AMD’s NPU. This new architecture is referred to as XDNA. [3]

The AMD AI Engine has gone through a few iterations that change the support and architecture. As of May 2025, there are currently 3 iterations of the AIE. Confusingly, because the term AIE refers to a technology and the first generation of AIE, sometimes the AIE-ML or onwards is also referred to as an AIE. For the sake of clarity, we will refer to the AIE technology as the AIE, and will refer to specific versions as:

  1. AIE1: First generation of the AIE. For clarity we refer it to AIE1, and not simply as AIE.

  2. AIE-ML: Addition of an AIE Memory Tile. These make up XDNA.

  3. AIE-MLv2: Second generation of AIE-ML. These make up XDNA2.

Note: the AIE and the AIE-ML have been referred to as AIE-1 and AIE-2. But please do not use these terms.

The AIE is integrated in AMD’s CPUs (referred to as the Ryzen AI Series) and certain AMD Versal FPGA SoC. It has kept its name through the Xilinx acquisition, as Versal was originally given and planned by Xilinx way back in 2018. [8] Because only certain Versal SoCs include the AIE, we include a small list of their products here. We also make an important make an important distinction that AMD’s Alveo Accelerator cards are not an addition to this list, as the Alveo V80 and Alveo V70 are Versal SoCs, hence the V in the naming convention. [5] And that the VCK5000 is another outlier.

First Generation Versal SoC Series Table

Series

Products

AIE-Type

Number

Notes

Versal Premium

VPK120 VPK180

N/A

0

Eval Kit

VP2502

AIE1

472

VP2802

VP1002 VP1052 VP1102 VP1202 VP1402 VP1502 VP1552 VP1702 VP1802 VP1902

N/A

0

Versal AI Edge

VEK280

AIE-ML

304

Eval Kit

VE2002

AIE-ML

8

VE2102

12

VE2202

24

VE2302

34

VE2602

152

VE2802

304

VE1752

AIE1

304

Versal RF

VR1602

AIE1

126

VR1652

VR1902

120

VR1952

Versal AI Core

VCK190

AIE1

400

Eval Kit

VC1502

AIE1

198

VC1702

AIE1

304

VC1802

AIE1

300

VC1902

AIE1

400

VC2602

AIE-ML

152

VC2802

AIE-ML

304

??? Alveo v70

AIE-MLv2

0

Discontinued

[9]

Product details are dubious

VC1902

AIE1

400

VCK5000 (Eval Board) [10]

Versal Prime

VMK180

N/A

0

Eval kit

VM1102 VM1302 VM1402 VM1502 VM1802 VM2152 VM2202 VM2302 VM2502 VM2902

N/A

0

Versal HBM

VHK158

N/A

0

Eval kit

VM1102 VM1302 VM1402 VM1502 VM1802 VM2152

N/A

0

XCV80

N/A

0

Alveo V80

[7] [6]

Discontinuation notice of FPGA boards. [6]

Second Generation Versal SoC Series Table

Series

Products

AIE-Type

Number

Notes

Versal Premium Gen 2

2VP3102 2VP3202 2VP3402 2VP3602

N/A

0

AI Edge Gen 2

2VE3304 2VE3358

AIE-MLv2

24

“The Versal AI Edge Series Gen 2 is currently in Early Access.” [v-edge-gen2]

2VE3504

96

2VE3558

2VE3804

144

2VE3858

Versal Prime Gen 2

2VM3358 2VM3558 2VM3654 2VM3858

N/A

0

[7]

Again, other Alveo cards are not included since they are not Versal.

A much more reliable method to inspect the number and generation of AIE is to open up Vitis, select your target, and open an Array View. Considering the high number of SoC variations and different boards there is likelihood of a mistake or of a missing entry.

CPUs are much easier. AIE in this context is always an NPU, from either architecture XDNA or XDNA2. These different architectures use either AIE-ML or AIE-MLv2 respetively.

AMD makes this list of AMD Ryzen AI CPUs so much easier to find. The generations to have NPUs are Ryzen 7040 (“Phoenix”), Ryzen 8040 (“Hawk Point”), and Ryzen 300 (“Strix Point”). [12] . Additionally, there are also mentions of Krackan Point, a mobile APU [11] .You can view a complete list of processors that have an NPU at AMD Ryzen AI.

Series

AIE-Type

Number

Notes

Ryzen AI 7040 (Phoenix)

AIE-ML

16

NPU XDNA Arch.

Ryzen AI 8040

(Hawk Point)

Ryzen AI 300

(Strix Point)

AIE-MLv2

32

(Including Ryzen AI Max) NPU XDNA2 Arch.

Architecture Differences#

Since the naming scheme does little distinguate the differences accross versions, we’ve dedicated a section to explain the differences.

Feature

AIE1 Tile

AIE-ML Tile

AIE-MLv2 Tile (Lack of Doc)

Clock Speed [*]_

1GHz

1 GHz

??

Per Tile Data Memory

32 KiB 4 Memory banks

64 KiB + (125-750 KiB Shared) 8 Memory banks shared w/3 neighbors

??

Int Type Support (bit)

8, 16, 32 AIE Funct list

4, 8, 16, 32 AIE-ML funct list

??

Vector Int Type Support (bit)

8, 16, 32

4, 8, 16

??

Floating Point Support

32

16, 32

AIE-ML +BF16

Vector Floating Point Support

32

None

??

Interconnects

Shared Memory Buffer. 2 input and 2 output Streaming Buffers

Shared Memory Buffer. 1 input and 1 output Streaming Buffer

AI Engine ML Memory Tile (Not part of the AIE Tile, but always included alongside)

No

Yes

Yes

Cascade Stream

Horizontal Direction. Rows alternate directions.

Two streams. One directed down and the other right.

On XDNA

4 rows, 5 columns

4 rows, 8 columns

[4] [3] [15] [16] [18] [14]

Clock, voltage, frequency, power distribution, PL and NoC interface, debug/trace funcionality, are the same across AIE1 and AIE-ML. [18]

Single precision floating point is supported, but does not meet all specifications and may have rounding errors. [4] In fact, some operations types are done by conversions to another, and are not native. Therefore, despite support, it is important to see if the architecture supports such operations natively.

So far, XDNA2 only exists for customers in the Strix Point. And while details for the AIE-MLv2 are currently extremely limited as well, we put out small details here. [20] The scaling refers to the previous XDNA generation.

XDNA2
  • 50 INT8 TOPS and 50 Block FP16 TFLOPS

  • 1.6x On-Chip Memory

  • Block floating point support

  • Better support non-linear function

  • 50% weight sparsity

  • 2x more concurrent spatial streams (Up to 8 concurrent Isolated Spatial Streams)

  • Per Column Power Gating

  • 2x Perf/W

[19] [20] [14]

FAQ#

Q: What does XDNA stand for?
A: Nobody knows. Some speculate Xilinx DNA. Maybe eXtreme DNA (like the AMD XCD Chiplet). But we can’t find any official statement.
Q: How should I choose between the AIE1 or AIE-ML(vX)?
A: Official AMD documentation:

The initial version, AI Engine (AIE), is optimized for DSP and communication applications, while the AI Engine-Machine Learning (AIE-ML) introduces a version optimized for machine learning. [18]
Q: Does the the AI in AIE really stand for Artificial Intelligence? Not Adaptive or Accelerated Integration or Intellgence?
A: While this does seem like an obvious question, we could not find the AIE referred to as Artificial Intelligence Engine anywhere. So while there is not a good amount of sources, a definitive source is the 2018 Xilinx (WP506) “AI Engines and Their Applications”, which specifies AI stands for Artificial Intelligence.
.. [*]
Q: Erm, the XDNA page shows that AIE-ML configuration only has a horizontal cascade stream only on the top row. It also says the AIE-ML engines run at 1.3 GHz, not 1 GHz
A: You’re absolutely correct! It does! But the AIE-ML documentation shows a horizontal cascade stream for all tiles and a normal 1 GHz. [16] But since the wording on the XDNA page says the AIE-ML can ‘run over 1.3GHz’, perhaps there is some clock frequency beyond FMax in the documentation at play. There are also a couple of other mistakes in press releases like Ryzen AI Announcement but it is showing AIE not AIE-ML. The tiles in the AMD products guide is wrong because the cascades do no alternate. Yeah, there are a few errors here and there.