What's New in Vitis AI Platform

Vitis AI 3.0

Vitis™ AI Platform 3.0 Release Highlights

AI Engine-ML enablement - early access on Alveo™ V70 data center accelerator card and Versal™ AI Edge series VEK280 evaluation kit
Improved custom model deployment with ONNX Runtime
Improved WeGO ease-of-use with quantizer integration and model coverage

Vitis AI Platform - What’s New by Category

Expand the sections below to learn more about the new features and enhancements in Vitis AI platform 3.0. For more information on the supported models, quantizer, compiler, or the DPU IPs, please check the GitHub repository or email: amd_ai_mkt@amd.com.

Added 14 new models and deprecated 28 models for a total of 130 models
Optimized models for applications:
- For AI medical and image enhancement: Super Resolution 4X, 2D/3D Semantic Segmentation
Optimized models for benchmarks:
- MLPerf: 3D-Unet
- FAMBench: MaskRCNN
Optimized backbones:
- Provides YOLO variants (YoloX, v4, v5, v6), EfficientNet-Lite
Ease-of-use enhancements: Data on Github.IO that helps to improve the user experience
Better support for UIF and improved UIF compatibility
72 PyTorch/TensorFlow models for CPUs with ZenDNN
Added GPU models for AMD GPUs based on ROCm+MIGraphX

Support for TensorFlow 2.10
Updated the Vitis Inspector to show more accurate partition results from XCompiler for various DPU architectures
Added support for data type conversions for float models, including FP16, BFloat16, FP32, and double
Added support for exporting the ONNX format of a quantized model
Support for more layers: SeparableConv2D and PReLU
Added support for unsigned integer quantization
Added support for automatic modification of input shapes for models with variable input shapes
Added support for aligning input and output quantize positions for the Concat and Pooling layers
Added error codes and improved the readability of error and warning messages
Some bugs fixed

Separated the quantizer codes from the TensorFlow codes, making it a plug-in module to the official TensorFlow library
Added support for exporting the ONNX format of a quantized model
Added support for data type conversions for float models, including FP16, BFloat16, FP32 and double
Support for more operations, including Max, Transpose, and DepthToSpace
Added support for aligning input and output quantize positions of Concat and Pooling operations
Added support for automatic replacement of Softmax to Hard-Softmax operations
Added error codes and improved the readability of error and warning messages
Some bugs fixed

Support for PyTorch 1.11 and 1.12
Support for exporting torch script format quantized models
QAT support for exporting trained models to ONNX and torch scripts
Support for FP16 model quantization
Optimized Inspector to support more pattern types and backwards compatibility of device assignments
More PyTorch operators: More than 560 types of PyTorch operators are supported
Enhanced parsing to support control flow parsing
Enhanced message system with more useful message text
Support for fusing and quantization of BatchNorm without affine calculation

Support for Keras layers of ConvTranspose2D, Conv3D, ConvTranspose3D
Support for TFOpLambda operations
Added pruning config that allows users to specify pruning hyper-parameters
Specific exception types are defined for each type of error
Support for TensorFlow 2.10

Added fine-grained model pruning: Sparsity
OFA support for convolution layers with kernel=(1,3) and dilatation
OFA support for ConvTranspose2D
Added pruning config that allows users to specify pruning hyper-parameters
Specific exception types are defined for each type of error
Enhanced parallel model analysis: More robust
Support for PyTorch 1.11 and 1.12

Support for new operators: strided_slice, cost volume, correlation 1d&2d, argmax, group conv2d, reduction_max, reduction_mean
Support for new hardware platform: DPUCV2DX8G
Error messages improvement
Partition messages improvement

Support for Versal AI Edge series VEK280 evaluation kit and Alveo V70 accelerator card
Support for ONNX Runtime with 11 new examples provided
Support for 15 new models
Added 4 new model libraries
Error messages improvement

Support for new DPUCV2DX8G DPU IP
Memory bandwidth profiling solution for Versal platforms

Upgraded to 2022.2
New features:
- Support for 1D and 2D Correlation
- Support for Argmax and Max
Optimized resources and timing

Upgraded to 2022.2
New features:
- Support for 1D and 2D Correlation
- Support for Cost-Volume
- Support for Argmax and Max along channel dimensions
Optimized resources and timing

Early Access release
Support for the most common 2D operators
Support for Batch 1 to 13
Support for 90+ CNN models

Updated Vitis tool from 2021.2 to 2022.2
Added scripts to improve timing in released XO flow

Early Access release
Support for the most common 2D operators
Support for Batch 13
Support for 70+ CNN models

Integrated WeGO with the quantizer for on-the-fly quantization and improved ease of use
Introduced serialization and deserialization over the WeGO flow to offer the capability of building once and running anytime
Incorporated AMD ZenDNN into WeGO to bring additional optimization opportunities on AMD EPYC CPUs
Improved WeGO robustness to offer solid deployment experience for more models

Vitis AI 2.5

Vitis™ AI 2.5 Release Highlights

AI Model Zoo added
- 14 new models, including Bidirectional Encoder Representations from Transformers (BERT)-based Natural Language Processing (NLP), Vision Transformer (ViT), Optical Character Recognition (OCR), Simultaneous Localization and Mapping (SLAM), and more Once-for-All (OFA) models
- 38 base and optimized models for AMD EPYC™ server processors
AI Quantizer added model inspector, now supports TensorFlow 2.8 and Pytorch 1.10
Whole Graph Optimizer (WeGO) supports Pytorch 1.x and TensorFlow 2.x
Deep-learning Processor Unit (DPU) for Versal™ ACAP supports multiple Compute Units (CUs), new Arithmetic Logic Unit (ALU) engine, Depthwise convolution and more operators (OPs) supported by the DPUs on VCK5000 Versal development card and Alveo™ data center accelerator cards
Inference server supports AMD ZenDNN as backend on AMD EPYC™ server processors
New examples added to Whole Application Acceleration (WAA) for VCK5000 card and Zynq™ UltraScale+™ ZCU102/ZCU104 evaluation kits

Vitis AI 2.5 What’s New by Category

Expand the sections below to learn more about the new features and enhancements.

14 new models and total 134 models available
Expanded model categories for diverse AI workloads:
- Added CNN models for text detection and E2E OCR
- Added BERT-base NLP model and ViT
- Added more OFA-optimized models, including super-resolution OFA-RCAN and object detection OFA-YOLO
- Added models for industrial vision and SLAM, including interest point detection and description model and hierarchical localization model
Added 38 base and optimized models for AMD EPYC server processors
Ease of use enhancement:
- Improved model index by application categories

Added model inspector that inspects a float model and shows partition results
TensorFlow 2.8 and Pytorch 1.10 upport
Float-scale and per-channel quantization support
Configuration support for different quantize strategies

OFA enhancement
- Even kernel size of convolution support
- ConvTranspose2d support
- Updated examples
One-step and iterative pruning enhancement
- Resumed model analysis or search after exception

ALU for DPUCZDX8G support
New models added in this release

Added 6 new model libraries
Supports 17 new models
Custom OP enhancement
Added new CPU operators
Xdputil tool enhancement
Two new demos on the VCK190 kit

Full support on custom OP and graph runner
Stability optimization

Edge DPU - DPUCZDX8G

New ALU engine that replaced pool engine and Depthwise convolution engine in MISC. The ALU engine supports
- New features such as large-kernel-size MaxPool, AveragePool, rectangle-kernel-size AveragePool, and 16bit const weights
- HardSigmoid and HardSwish
- DepthWiseConv + LeakyReLU
- Parallelism configuration
New DPU IP and targeted reference design (TRD) on the ZCU102 kit with encrypted RTL IP on Vitis 2022.1 platform

Edge DPU - DPUCVDX8G

Optimized ALU that better supports features like channel-attention
Multiple Cus support
DepthWiseConv + LeakyReLU support
New DPU IP for Versal ACAP and TRD on the VCK190 kit with encrypted RTL and AI Engine code, which still support C32B1-6/C64B1-5 based on Vitis 2022.1 platform

Cloud DPU - DPUCVDX8H

Enlarged Depthwise convolution kernel size range from 1x1 to 8x8
AI Engine-based pooling, and elementwise add and multiply, and big kernel size pooling
More Depthwise convolution kernel sizes

Cloud DPU - DPUCADF8H

ReLU6/LeakyReLU and MobileNet series of models support
Fixed issue of missing directories in some cases in the .XO flow

PyTorch 1.x and TensorFlow 2.x in-framework inference support
Added 19 PyTorch 1.x/TensorFlow 2.x/TensorFlow 1.x examples, including classification, object detection, and segmentation

Added gRPC API to inference server flow
TensorFlow/PyTorch with AMD ZenDNN as backend support

New examples for the VCK5000 card and ZCU104 kit - ResNet & adas_detection applications
New ResNet example containing AI Engine-based pre-processing kernel
Xclbin generation using pre-built DPU flow for the Alveo U50 card and ZCU102 kit - ResNet and adas_detection applications
Xclbin generation using build flow for the ZCU104 and VCK190 kit - ResNet and adas_detection applications
Porting of all VCK190 examples to production board with use base platform

Vitis AI 2.0

Vitis AI 2.0 Release Highlights

General Availability (GA) for VCK190 (Production Silicon), VCK5000 (Production Silicon) and U55C
Added support for newer Pytorch and Tensorflow version: Pytorch 1.8-1.9, Tensorflow 2.4-2.6
22 additional new models, including Solo, Yolo-X, UltraFast, CLOCs, PSMNet, FairMOT, SESR, DRUNet, SSR as well as 3 NLP models and 2 OFA (Once-for-all) models
Added new custom OP flow to run models with DPU un-supported OPs with enhancement across quantizer, compiler and runtime
Additional layers and configurations of DPU for VCK190 and DPU for VCK5000
Added OFA pruning and TF2 keras support for AI optimizer
Run inference directly from Tensorflow (Demo) for cloud DPU

Vitis AI 2.0 What’s New by Category

Expand the sections below to learn more about the new features and enhancements.

22 new models added, 130 total
- 19 new Pytorch models including 3 NLP and 2 OFA models
- 3 new Tensorflow models
Added new application models
- AD/ADAS: Solo for instance segmentation, Yolo-X for traffic sign detection, UltraFast for lane detection, CLOCs for sensor fusion
- Medical: SESR for super resolution, DRUNet for image denoise, SSR for spectral remove
- Smart city and industrial vision: PSMNet for binocular depth estimation, FairMOT for joint detection and Re-ID
EoU Enhancements
- Updated automatic script to search and download required models

TF2 quantizer
- Add support TF 2.4-2.6
- Add support for custom OP flow, including shape inference, quantization and dumping
- Add support for CUDA 11
- Add support for input_shape assignment when deploying QAT models
- Improve support for TFOpLambda layers
- Update support for hardware simulation, including sigmoid layer, leaky_relu layer, global and non-global average pooling layer
- Bugfixs for sequential models and quantize position adjustment
TF1 quantizer
- Add quantization support for new ops, including hard-sigmoid, hard-swish, element-wise multiply ops
- Add support for replacing normal sigmoid with hard sigmoid
- Update support for float weights dumping when dumping golden results
- Bugfixs for inconsistency of python APIs and cli APIs
Pytorch quantizer
- Add support for pytorch 1.8 and 1.9
- Support CUDA 11
- Support custom OP flow
- Improve fast finetune performance on memory consumption and accuracy
- Reduce memory consumption by feature map among quantization
- Improve QAT functions including better initialization of quantization scale and new API for getting quantizer’s parameters
- Support more quantization of operations: some 1D and 3D ops, DepthwiseConvTranspose2D, pixel-shuffle, pixel-unshuffle, const
- Support CONV/BN merging in pattern of CONV+CONCAT+BN
- Some message enhancement to help user locate problem
- Bugfixs about consistency with hardware

TensorFlow 1.15
- Support tf.keras.Optimizer for model training
TensorFlow 2.x
- Support TensorFlow 2.3-2.6
- Add iterative pruning
PyTorch
- Support PyTorch 1.4-1.9.1
- Support shared parameters in pruning
- Add one-step pruning
- Add once-for-all(OFA)
- Unified APIs for iterative and one-step pruning
- Enable pruned model to be used by quantizer
- Support nn.Conv3d and nn.ConvTranspose3d

DPU on embedded platforms
- Support and optimize conv3d, transposedconv3d, upsample3d and upsample2d for DPUCVDX8G(xvDPU)
- Improve the efficiency of high resolution input for DPUCVDX8G(xvDPU)
- Support ALUv2 new features
DPU on Alveo/Cloud
- Support depthwise-conv2d, h-sigmoid and h-swish for DPUCVDX8H(DPUv4E)
- Support depthwise-conv2d for DPUCAHX8H(DPUv3E)
- Support high resolution model inference
Support custom OP flow

Support all the new models in Model Zoo: end-to-end deployment in Vitis AI Library
Improved GraphRunner to better support custom OP flow
Add examples on how to integrate custom OPs
Add more pre-implemented CPU OPs
DPU driver/runtime update to support AMD Device Tree Generator (DTG) for Vivado flow

Support CPU tasks tracking in graph runner
Better memory bandwidth analysis in text summary
Better performance to enable the analysis of large models

CNN DPU for Zynq SoC / MPSoC, DPUCZDX8G (DPUv2)
- Upgraded to 2021.2
- Update interrupt connection in Vivado flow
CNN DPU for Alveo-HBM, DPUCAHX8H (DPUv3E)
- Support depth-wise convolution
- Support U55C
CNN DPU for Alveo-DDR, DPUCADF8H (DPUv3Int8)
- Updated U200/U250 xlcbins with XRT 2021.2
- Released XO Flow
- Released IP Product Guide (PG400)
CNN DPU for Versal, DPUCVDX8G (xvDPU)
- C32 (32-aie cores for a single batch) and C64 (64-aie cores for a single batch) configurable
- Support configurable batch size 1~5 for C64
- Support and optimize new OPs: conv3d, transposedconv3d, upsample3d and upsample2d
- Reduce Conv bubbles and compute redundancy
- Support 16-bit const weights in ALUv2
CNN DPU for Versal, DPUCVDX8H (DPUv4E)
- Support depth-wise convolution with 6 PE configuration
- Support h-sigmoid and h-swish

Upgrade to Vitis and Vivado 2021.2
Custom plugin example: PSMNet using Cost Volume (RTL Based) accelerator on VCK190
New accelerator for Optical Flow (TV-L1) on U50
High resolution segmentation application on VCK190
Options to compare throughput & accuracy between FPGA and CPU Versions
- Throughput improvements ranging from 25% to 368%
Reorganized for better usability and visibility

Provides new capability of deploying models with DPU unsupported OPs
- Define custom OPs in quantization
- Register and implement custom OPs before the deployment by graph runner
Add two examples
- Pointpillars Pytorch model
- MNIST Tensorflow 2 model

Add support of DPUs for U50 and U55C

Run inference directly from Tensorflow framework for cloud DPU
- Automatically perform subgraph partitioning and apply optimization/acceleration for DPU subgraphs
- Dispatch non-DPU subgraphs to TensorFlow running on CPU
Resnet50 and Yolov3 demos on VCK5000

Support xmodel serving in cloud / on-premise (EA)

vai_q_caffe hangs when TRAIN and TEST phases point to the same LMDB file
TVM compiled Inception_v3 model gives low accuracy with DPUCADF8H (DPUv3Int8)
TensorFlow 1.15 quantizer error in QAT caused by an incorrect pattern match

Vitis AI 1.4

Vitis AI 1.4 Release Highlights

Support new platforms, including Kria KV260 SoM kit and Versal ACAP platforms VCK190, VCK5000;
Extended Pytorch framework support from version 1.5 to version 1.7.1;
Added new state-of-the-art models, including 4D Radar detection, Image-Lidar sensor fusion, 3D detection & segmentation, multi-task, depth estimation, super resolution and more models that applicable to automotive, smart medical, industrial vision applications;
Easier subgraph partition user experience with the new Graph Runner API;
Improved performance;

Vitis AI 1.4 What’s New by Category

Expand the sections below to learn more about the new features and enhancements in Vitis AI 1.4.

Added 16 new models, and total 108 models from different deep learning frameworks (Caffe, TensorFlow, TensorFlow 2 and PyTorch) are provided.
Increased the diversity of models compared to Vitis AI 1.3:
1. For autonomous driving and ADAS, added 4D Radar detection, Image-Lidar sensor fusion, surround-view 3D detection, upgraded 3D segmentation and multi-task models
2. For medical and industrial vision, added depth estimation, RGB-D segmentation, super-resolution and other reference models
EoU enhancement: provided automated download scripts for free selection of the versions according to model name and hardware platform

Support fast finetune in post-training quantization (PTQ);
Improved quantize-aware training(QAT) functions:
Support more layers: swish/sigmoid, hard-swish, hard-sigmoid, LeakyRelu, nested tf.keras functional and sequential models
Support more layers:
1. swish/sigmoid, hard-swish, hard-sigmoid, LeakyRelu
2. Nested tf.keras functional and sequential models
Support new models: EfficientNet, EfficientNetLite, Mobilenetv3, Yolov3 and Tiny Yolov3
Support custom layers via subclassing tf.keras.layers and support custom quantization strategies
Support custom layers and support custom quantization strategies
Improved ease-of-use and bug fixed

Support Pytorch 1.5-1.7.1
Support activations
1. hard-swish, hard-sigmoid
Support more operators:
1. Const, Upsample, etc.
Support shared parameters in quantization
Enhanced quantization profiling and error check functions
Improved QAT functions:
1. support training from PTQ results
2. support reused modules
3. support resuming training

Support tf.keras APIs in TF1
Supports single GPU mode for model analysis

Improved easy-of-use with simplified APIs;
Support torch.nn.ConvTranspose2d;
Support reused modules;

Support ALU for DPUCVDX8G (xvDPU)
Support cross-layer prefetch optimization option
Support xmodel output nodes assignment
Enabled features to implement zero-copy for:
1. DPUCZDX8G (DPUv2)
2. DPUCAHX8H (DPUv3E)
3. DPUCAHX8L (DPUv3ME)
Open-sourced network visualization tool Netron officially supports AMD XIR

Support the 16 new models in AI Model Zoo:
1. 11 new Pytorch models
2. 5 new Tensorflow models, 1 from Tensorflow 2.x
3. 1 new Caffe models
Introduced new deploy APIs graph_runner, especially for models with multiple subgraphs
Introduced new tool xdputil for DPU and xmodel debug
Support new KV260 SoM kit
Support DPUCVDX8G (xvDPU) on VCK190
DPUCVDX8H (DPUv4E) on VCK5000

Support Versal platforms VCK190 and VCK5000
Support Petalinux 2021.1, OpenCV v4
1. EoU improved by updating the samples to use INT8 as input, reduced the conversion from FP32 to INT8;

Support new DPU IPs:
1. DPUCVDX8G (xvDPU)
2. DPUCAHX8L (DPUv3ME)
3. DPUCVDX8H (DPUv4E)
Support DPUv2 & xvDPU in vivado flow
Memory IO statistics
EoUs improved

DPUv2 IP upgraded to 2021.1

VCK190 xvDPU TRD
Support batch size 1~6 which is configurable based on C32 mode
PL support new OPs:
1. Global Average Pooling up to 256x256, Element Multiply, Hardsigmoid and Hardswish
More models deployed

Release xo in Vitis AI 1.4

Support latest U250 platform (2020.2)
Support latest U200 platform (2021.1)
Bug fixed

Improved the DPU performance of small networks processing with weight pre-fetch function

Multi Object Tracking (SORT) example on ZCU102 provided
Classification App example for Versal (VCK190) provided
Updated existing examples to XRT APIs and zero copy
U200 (DPUv3INT8) TRD provided
Ported U200/250 examples to use DPUv3INT8 instead of DPUv1
Example for xRNN pre-processing acceleration (embedding layer)
SSD MobileNet U280 example now accelerates both pre and post-processing on hardware

Support of all DPUs - ZCU102/4, U50, U200, U250, U280
Using Petalinux for edge devices
Increased throughput using AKS at the application level
Yolov3 tutorial as python notebook

Unified DPU kernels into one and added samples for Alveo U200/250 (DPUv3INT8), U280, U50, U50lv

Servers

Business Systems

Workstations

Embedded

Personal Laptops

Personal Desktops

Related Products

Resources

Workstations

Desktops

Laptops

Resources

Adaptive SoCs & FPGAs

Evaluation Boards & Kits

Technologies

Developer Resources

DPU Accelerators

Adaptive Accelerators

SmartNICs & Ethernet Adapters

System on Modules (SOMs)

GPU Accelerators

Processor Tools

Graphics Tools & Apps

Adaptive SoCs & FPGA Tools

Intellectual Property & Apps

GPU Accelerator Tools & Apps

DPU Accelerator Tools

Workloads

Deployments

Network, Infrastructure, & Storage

Resources

Industries

Industries

Industries

Industries

Gaming

Technologies

Systems

EPYC Processors

Radeon Graphics & AMD Chipsets

Adaptive SoCs & FPGAs

Alveo Accelerators & Kria SOMs

Ryzen Processors

Ethernet Adapters

Processors & Graphics

DPU Accelerators

Adaptive SoCs & FPGAs

Overview

Graphics

Instinct Accelerators

Adaptive SoCs & FPGAs

EPYC Processors

Overview

Product Information & Training

Product Specifications

Resources

What's New

What's New in Vitis AI Platform

Vitis™ AI Platform 3.0 Release Highlights

Vitis AI Platform - What’s New by Category

Vitis™ AI 2.5 Release Highlights

Vitis AI 2.5 What’s New by Category

Vitis AI 2.0 Release Highlights

Vitis AI 2.0 What’s New by Category

Vitis AI 1.4 Release Highlights

Vitis AI 1.4 What’s New by Category

Company

News & Events

Community

Partners

Investors