Table of Contents

Field Programmable Gate Array

Pinned resources

  • Textbooks:
    • Vanderbauwhede, "High-Performance computing using FPGAs"
    • Reddit r/FPGA

FPGA is as close to the final frontier of electronics as I can possibly get, so why not dive right in? The chips used in the lab are mainly Lattice MachXO 1200 chips, which are designed using the Diamond software, and written in Verilog. To populate article on how to use this piece of tech later.

Good references on FPGA tutorials

    • Written to be more like a cheatsheet/overview
    • Great for example projects and sample implementation
    • Covers high-level topics on digital logic, FPGA design principles and HDL
    • Written in blog article-style
    • Goes in detail the development of FPGAs as well as historical design choices
    • Some examples are a little incomplete in terms of illustrations

A preamble: This page is designed to be quick summaries of what I learnt and read up on, especially with regard to theory. Please don't expect it to be a full-blown tutorial - there are websites for that. Example projects (especially those focused on FPGAs) should go into the corresponding FPGA logs instead. Some effort is put into portraying a somewhat consistent linear learning flow, with appropriate resources.

What and why FPGAs

FPGAs are programmable digital logic chips, which can run logical functions that are compiled and downloaded into the FPGA. These thus typically run much faster than discrete components due to its more compact size.

There are four relatively big players in the FPGA market:

  1. Xilinx (AMD) invented the FPGAs, which typically has open device architectures and high performance
  2. Altera (Intel) focuses more on software usability, but is slightly less open and performant
  3. Lattice
  4. Actel

Comparisons to other technologies

FPGAs are usually compared to are microcontrollers, and they fulfill different niches.

Microcontrollers uses a CPU architecture for abstraction in general-purpose computing, i.e. they come with a predefined instruction set as well as pinouts. These operate serially by design. FPGAs in contrast involves configuring logical circuits which can be arbitrarily programmed. These can operate in parallel and thus work nicely for higher rates, but can quickly run into physical limitations with larger circuits.

FPGAs are also flexible when it comes to deciding the tradeoff between speed and cost until very late in the design cycle - parallelism typically involve larger areas for speed, while a serial design trades speed for a more compact implementation.

More broadly, FPGA belong to a class of programmable digital circuits called Programmable Logic Device (PLD), which range from:

These chips can also be categorised by the number of transistors... but nuff said.

Diagrams for each architecture

The following writeup gives a pretty good high-level overview.

This is as opposed to ASICs which cannot be updated after manufacturing, see Intel's FDIV bug.

Internal structure

The primary components of an FPGA are:

On top of these, typically more than 80% of the chip is dedicated to the auxiliary circuits for programming said logic blocks as well as the programmable interconnects between these blocks.

Logic blocks

Internal implementations of FPGAs are highly vendor-specific, with different models also carrying specialized primitives for specific use cases. We discuss these in a generic manner.

Logic blocks fundamentally are implemented as a series of look-up tables (LUTs). A two-input LUT (LUT2) can simulate any logic gate, noting that the output gate is also configurable (Xilinx and Altera FPGAs typically have one output gate per LUT). These are implemented using a combination of SRAM bits (holding the LUT-mask in configuration memory, CRAM) and a multiplexer (MUX) - in the LUT3 below, the $$a$$, $$b$$, and $$c$$ specify the input to the LUT, with an LUT-mask of 01000111 specifying the output for each corresponding input to the LUT3:

LUTs can be further composed using smaller LUTs and a MUX - below on the left shows a configuration for either two independent LUT5s, or one LUT6. The diagram on the right combines the two concepts:

More complex LUTs can be designed to optimize for area/speed, e.g. Altera's Adaptive Logic Module (ALM) that provides an adaptive combinations of LUT3s and LUT4s to share between two LUT5-equivalent logic functions, with additional registers and adders:

LUTs can be connected to flip-flops, together termed as a slice. Flip-flops allow signals to be held for more than a clock-cycle. There are typically different types of slices in an FPGA. Tying them all together is the routing switching matrix to interconnect different Complex Logic Blocks (CLBs) consisting of multiple slices each (for Xilinx FPGAs, SLICEM, SLICEL, and SLICEX that support different functionality for better performance in specialized cases such as arithmetic):

Interconnecting configurable logic blocks

Each CLB contains the LUT, whose output is fanned either directly to the output or into the register, which is selected by the output switch.

Design flow

Typically conducted in a series of steps:

  1. Design entry: Representing design idea into computerized representation, in the form of Hardware Descriptor Languages (HDLs) which are commonly either Verilog or VHDL (Very High Speed Integrated Circuit HDL).
  2. Synthesis: Generates a netlist from the HDL using primitives supplied by the FPGA. Usually additionally performs logic optimization, register load balancing (retiming) and other timing performance optimizations (buffering / replication / pipelining / multiplexing).
  3. Map, Translate, Place and route: Placer determines a place in the chip for each primitive, while router interconnects primitives in order to satisfy timing constraints. Static Timing Analysis (STA) is the most important part, where timing performance is quantified.
  4. Bitstream generation: Storage of FPGA program in non-volatile flash memory, to be programmed onto the FPGA's SRAM.

Synthesis and place and route tools are typically vendor-specific, though there are also open-source varieties for Lattice (yosys nextpnr) and Xilinx (Project X-Ray), as well as officially-endorsed netlist generation tools (e.g. RapidWright for Xilinx).

This article goes into much further detail on the testing and simulation stages.

Description from CG3207 Computer Architecture

High-level design

Two strategies: Register Transfer Level (RTL) or Gate Level modelling. In RTL, behaviours of components are modeled, e.g. what to do when a rising clock edge is detected. In GL, specific components (e.g. NOT or D flip-flops) are specified and directly connected.