Introduction to Xilinx FPGAs
Xilinx is a leading vendor of field programmable gate arrays (FPGAs), complex programmable logic devices (CPLDs), and adaptive compute acceleration platforms (ACAPs). Founded in 1984, Xilinx invented the FPGA product category and continues to be the market leader after over 30 years, accounting for nearly 50% of worldwide FPGA sales.
Some of the key milestones in Xilinx’s history include:
- 1984 – Company founded, introducing world’s 1st FPGA
- 1990 – Launched XC4000 family, pioneering use of SRAM configuration
- 1997 – Introduced Virtex family for high-performance applications
- 2003 – Created the EDK design tool suite
- 2010 – Launched 7 series FPGAs using 28nm process
- 2011 – Acquired AutoESL for high-level synthesis capabilities
- 2015 – Unveiled UltraScale architecture on 16nm FinFET
- 2016 – Acquired software defined networking leader Barefoot Networks
- 2019 – Released Versal ACAP, the industry’s first adaptive compute platform
- 2020 – Acquired Solarflare to bolster data center networking
Over 3 decades, Xilinx has consistently pioneered new innovations in FPGA architectures, software design tools, and software programmability. Their FPGAs power a diverse range of applications from data center networking to automotive driver assist to wireless base stations.
Xilinx organizes their FPGA product families into three tiers:
- Entry-level FPGAs – Lowest cost devices for simple logic functions.
- Mid-range FPGAs – Balance of cost and capabilities for most applications.
- High-end FPGAs – Highest performance, logic density and features.
Within each generation, there are multiple product families to address different application needs. The latest product families include:
Versal ACAP – Revolutionary new adaptive compute acceleration platform combining scalar, vector, and matrix engines.
Virtex UltraScale+ – Flagship high-end family with highest system performance.
Zynq UltraScale+ MPSoC – Integrates real-time processors with FPGA programmability.
Kintex UltraScale+ – Mid-range balancing capability, power and cost.
Artix UltraScale+ – Lowest cost with essential performance for mobile and embedded.
Spartan – Entry-level low-density FPGAs for simple logic functions.
Xilinx remains firmly committed to pushing the boundaries of FPGA technology through industry-leading research and development. With a rich history of innovation and leadership, Xilinx FPGAs power diverse applications across every major industry.
In this article, we will take a deep dive into Xilinx’s latest FPGA architectures and the design tools used to program them.
Xilinx FPGA Architecture Overview
At a high level, all Xilinx FPGAs share a common programmable logic architecture consisting of:
- Configurable Logic Blocks (CLBs) – Basic building blocks of logic and routing that implement logic functions.
- Input/Output Blocks (IOBs) – Periphery blocks supporting external I/O interfaces.
- Programmable Interconnect – Routing that connects the CLBs and IOBs.
- Clock Management Tiles (CMTs) – Digital clock managers, converters, muxes.
- Support Blocks – Specialty hardened blocks like DSPs, RAMs, processors.
- Configuration Memory – Stores FPGA programming bitstream.
This generic architecture is then customized and optimized differently for each FPGA product family to achieve specific capabilities. The programmable logic fabric along with hardened support blocks is tailored for each family’s target applications.
Over the years, Xilinx FPGA families have migrated across different manufacturing process nodes to benefit from advances in lithography and transistor densities. Newer process nodes enable higher logic capacity, performance, lower power consumption, and smaller die sizes.
The following sections dive deeper into the architectures of the latest Xilinx FPGA families.
Flagship Virtex UltraScale+ Architecture
The Virtex UltraScale+ family represents the highest capability and performance FPGAs from Xilinx. Built on a leading-edge 16nm FinFET+ process, these FPGAs target demanding applications like data center networking, 5G wireless, test & measurement, and defense systems.
Some key characteristics of Virtex UltraScale+ FPGAs include:
- Advanced 16nm FinFET+ process enabling up to 9 million logic cells
- Multi-processor SoC architecture
- Ultra high bandwidth memory (HBM) support
- Direct RF integration for wideband applications
- High-speed serial I/O up to 112 Gbps
- Sophisticated power management and 3D-IC packaging innovations
The basic building block of the Virtex UltraScale+ FPGA fabric is the configrable logic block (CLB). Each CLB includes:
- 8 LUTs for combinatorial logic
- 16 flip-flops for sequential logic
- Arithmetic carry logic
- Distributed RAM and shift register capabilities
By connecting CLBs through the programmable routing, designers can construct large complex logic functions. Virtex UltraScale+ utilizes 6-input LUTs which can also be fractured into two 5-input LUTs for higher utilization.
To support a wide variety of interfaces, Virtex UltraScale+ includes high-performance I/O columns with over 50 I/O standards supporting speeds up to 112 Gbps. The multi-mode I/Os feature integrated source synchronous capabilities, block RAM FIFOs, built-in DSP slices, and dynamic phase alignment circuitry.
Flexible clock management is critical for Xilinx FPGAs. Each I/O column in Virtex UltraScale+ contains two mixed-mode clock manager (MMCM) blocks and two phase-locked loops (PLLs) for clock synthesis, buffering, and deskew. The clocking blocks enable zero-delay buffering, frequency synthesis, jitter filtering, and phase shifting.
For embedded memory needs, Virtex UltraScale+ contains 288Mb of dedicated UltraRAM blocks. Each block can be configured as a 32Kb RAM or FIFO up to 500MHz. The UltraRAMs deliver higher capacity, lower latency, and reduced power compared to traditional FPGA block RAM.
For high-performance digital signal processing, Virtex UltraScale+ contains up to 6,840 DSP48E2 slices. Each DSP slice can perform floating point arithmetic, multiply-accumulate operations, wide bus multiplexing, barrel shifting, and more.
Soft Processor Cores
To enable embedded software programming, Virtex UltraScale+ can implement MicroBlaze soft processor cores in the FPGA fabric. Developers can customize MicroBlaze to meet their specific performance, power, and area requirements.
Hard Block for 100G Ethernet
For 100G Ethernet applications, selected Virtex UltraScale+ variants incorporate a hard 100G Ethernet block. This eliminates soft logic implementation complexity by providing a fully-hardened 100G Ethernet MAC+PCS+Gearbox+FEC core.
Configuration and Security
For configuration, Virtex UltraScale+ utilizes 128Mb or 1Gb SPI flash memories. Security features such as 256-bit AES encryption and RSA authentication help protect designs. One-time programmable eFUSE bits allow the storing of encryption keys.
With stacked silicon interconnect (SSI) technology, multiple Virtex UltraScale+ die can be integrated vertically in a single package. This enables massive logic capacity and memory bandwidth that would not be possible with standard 2D packaging.
By optimizing the FPGA architecture for high-performance applications, Virtex UltraScale+ FPGAs achieve new levels of system capability not possible with previous generations. The advanced 16nm process allows for up to 9 million logic cells along with ample hard blocks for memory, DSP, I/O, and processors. For developers needing to maximize logic capacity and performance, Virtex UltraScale+ delivers industry-leading FPGA technology.
Zynq UltraScale+ MPSoC Architecture
The Zynq UltraScale+ MPSoC combines real-time ARM processor cores with Xilinx programmable logic fabric on a single chip. This enables software programmers and hardware designers to leverage their respective skills on one platform.
Some key attributes of the Zynq UltraScale+ MPSoC include:
- Quad-core ARM Cortex-A53 up to 1.5 GHz
- Dual-core ARM Cortex-R5 real-time processors
- Arm Mali GPU for graphics processing
- Video codec unit for 4K video processing
- Xilinx FPGA programmable logic up to 2.3 million cells
- Advanced peripherals for connectivity, security, and more
By integrating processor cores in an FPGA, the Zynq MPSoC allows developers to adapt the system architecture exactly to their needs. Software engineers can leverage familiar embedded programming tools for the ARM processors while hardware engineers can program the FPGA fabric to create custom accelerators and I/O tailored for the application.
Application Processor Unit (APU)
The application processor unit (APU) contains the ARM Cortex-A53 and Cortex-R5 CPUs along with memory interfaces and connectivity peripherals.
The quad-core Cortex-A53 runs up to 1.5GHz, executing general purpose embedded software applications. The dual-core Cortex-R5 provides real-time processing for workloads requiring deterministic execution.
The APU includes embedded memories along with controllers for DDR4, DDR3, PCIe, USB, SATA, Gigabit Ethernet, and CAN bus. There is also a security engine for enabling hardware-based security use cases.
Real-Time Processor Unit (RPU)
The real-time processor unit (RPU) is built around the dual-core Cortex-R5 to provide a deterministic execution environment for real-time applications. It has tightly coupled instruction and data memories to ensure low-latency access.
The RPU can operate in lock-step mode for safety critical systems requiring redundancy. External real-time peripherals can be directly accessed by the RPU through its dedicated interfaces.
Programmable Logic (PL)
Up to 2.3 million system logic cells make up the programmable logic in Zynq UltraScale+ MPSoCs. The PL is fully equivalent to Artix or Kintex UltraScale+ FPGA fabric. It provides abundant CLBs, UltraRAM blocks, DSP slices, MMCMs, and more.
Developers can leverage the PL to create custom accelerators, I/O processors, video/graphics pipelines, and any other application-specific circuits. The PL seamlessly integrates with the APU and RPU using low latency coherency ports.
A host of specialized peripherals enable Zynq UltraScale+ MPSoC connectivity:
- 114 high-speed serial transceivers up to 16.3Gbps
- 8x PCIe Gen3 interfaces and 2x 100G Ethernet controllers
- 4x DisplayPort/HDMI 2.0a with HDR support
- 2x USB 3.0, 2x Gigabit Ethernet, 2x CAN bus
- 576 KB L2 cache with ECC protection
- Secure boot with AES/SHA encryption blocks
For storage, the programmable logic can be used to integrate SAS, SATA, SD cards, NAND flash controllers and more.
Software and Tools
The Zynq UltraScale+ design flow supports both the embedded software development and FPGA hardware design:
- Xilinx Software Development Kit (SDK) – C/C++ embedded software tools for the APU and RPU.
- Vivado Design Suite – High-level synthesis and logic implementation for the PL.
- PetaLinux – Linux platform generation for the APU.
- Bare-metal SDK – Low-level software libraries.
With these tools, developers can fully unlock the potential of the Zynq UltraScale+ MPSoC combining real-time processing, graphics, hardware acceleration, and connectivity all on a single chip.
Kintex UltraScale+ Mid-Range FPGAs
Occupying Xilinx’s mid-range portfolio, Kintex UltraScale+ FPGAs balance capability, cost and power efficiency for the majority of applications. Built on a leading-edge 16nm FinFET process, Kintex UltraScale+ provides 1.5 million logic cells along with ample memory, DSP, and I/O for mid-range applications.
Some highlights of Kintex UltraScale+ FPGAs:
- 16nm FinFET process for optimal power/performance
- Up to 1.5M logic cells and 6Mb block RAM
- UltraRAM blocks for large on-chip memory needs
- High-speed serial I/O up to 32.75 Gbps
- Sophisticated power management techniques
- High-reliability solutions available
Kintex UltraScale+ utilizes the same CLB architecture as Virtex UltraScale+ with 6-input LUTs as the basic logic cell. While not as high density as Virtex, the Kintex family still provides abundant logic resources up to 1.5M cells for mid-range applications.
For embedded memory, Kintex UltraScale+ offers 12,288 Kb of ultraRAM in 288 blocks along with 5,964 smaller block RAMs. This totals over 6Mb of on-chip RAM for FIFOs, buffers, and other memory needs.
High speed signal processing is enabled with up to 2,160 DSP48E2 slices. Each DSP slice supports 27×27 multipliers, barrel shifters, wide multiplexers, and many other arithmetic operations.
Supporting a wide range of standards, Kintex UltraScale+ incorporates 592 I/O pins with speeds up to 32.75 Gbps. The multi-use I/Os minimize PCB traces by supporting differential and single-ended interfaces.
To optimize power efficiency, Kintex UltraScale+ incorporates abundant power saving techniques including fine-grained clock gating, multi-voltage support, and optimized logic mappings to reduce switching power.
For design security, configuration bitstreams can be encrypted using 256-bit AES-GCM. Two AES engines are integrated on-chip for decrypting bitstreams after loading. The decryption keys are stored in one-time programmable eFUSE memory.
By balancing high performance and capability with power efficiency and mainstream pricing, Kintex UltraScale+ hits the sweet spot for many applications in wireless infrastructure, aerospace and defense, wired networking, and test and measurement systems. For designers needing more capability than low cost FPGAs but overkill for the highest end, Kintex UltraScale+ offers a compelling mid-range solution.
Low Cost Artix UltraScale+ FPGAs
Occupying the lowest cost segment of Xilinx’s 16nm UltraScale+ portfolio, Artix FPGAs maximize value for high-volume mid-range applications. Artix UltraScale+ delivers essential capability for mainstream designs but eschews some of the highest-end features to reduce cost through optimization for high-volume pricing.
Some attributes of Artix UltraScale+ FPGAs:
- Low cost 16nm FinFET optimized design
- Up to 0.5 million logic cells
- High bandwidth memory (HBM) interface
- Low power 36 Gbps transceivers
- Optimized for cost sensitivity
Programmable Logic Cells
Artix utilizes the same CLB architecture as the higher UltraScale+ families but in smaller quantities with up to 0.5 million cells. This balances logic density with cost by right-sizing for mainstream designs.
With up to 12Mb of block RAM, Artix delivers sufficient embedded memory for most applications. However, the larger UltraRAM blocks found in higher-end families are not available in Artix to reduce cost.
To support high-speed interfaces, Artix incorporates up to 16 transceiver channels supporting data rates up to 36 Gbps. While not as high as larger families, this is ample performance for mainstream applications.
High Bandwidth Memory
Unique among low cost FPGAs, Artix UltraScale+ incorporates a high bandwidth memory (HBM) interface. This allows stacking DRAM directly on the FPGA package for tremendous memory bandwidth.
Artix utilizes the same power saving innovations as larger UltraScale+ FPGAs including fine-grained clock gating and multi-voltage support. Static power is minimized through transistor optimization specific for high-volume production.
By focusing the architecture on high-volume mainstream designs rather than maximizing high-end features, the Artix UltraScale+ FPGA provides a cost-optimized solution for applications needing moderate density, performance, and power. For applications like wireless radio, driver assistance, IoT systems, and embedded vision, Artix offers compelling value.
Entry-Level Spartan FPGAs
At the low end of Xilinx’s portfolio sit the Spartan series FPGAs, optimized for simple logic integration at minimal cost. Spartan FPGAs offer essential programmability for low-complexity designs that do not require advanced features.
Here are some of the key attributes of Spartan FPGAs:
- Lowest cost Xilinx FPGA option
- Densities from 3K to 150K logic cells
- Single-core MicroBlaze soft processors
- Up to 204 User I/O pins
- Option for integrated PCIe, Ethernet, USB
- Automotive, industrial, IoT applications
Spartan FPGAs utilize the same basic 6-input LUT architecture as larger families but with smaller and simpler logic blocks. The number of logic cells ranges from just 3K up to 150K for simple programmability.
With up to 204 shared I/O pins, Spartan provides sufficient connectivity for low-complexity systems. The multi-standard I/Os support common interfaces like LVCMOS, LVDS, and differential standards.
For programming, Spartan comes pre-programmed from the factory or supports SPI flash configuration. Advanced configuration schemes like PCI