Pinned resources
FPGA is as close to the final frontier of electronics as I can possibly get, so why not dive right in? The chips used in the lab are mainly Lattice MachXO 1200 chips, which are designed using the Diamond software, and written in Verilog. To populate article on how to use this piece of tech later.
Good references on FPGA tutorials
A preamble: This page is designed to be quick summaries of what I learnt and read up on, especially with regard to theory. Please don't expect it to be a full-blown tutorial - there are websites for that. Example projects (especially those focused on FPGAs) should go into the corresponding FPGA logs instead. Some effort is put into portraying a somewhat consistent linear learning flow, with appropriate resources.
References
FPGAs are programmable digital logic chips, which can run logical functions that are compiled and downloaded into the FPGA. These thus typically run much faster than discrete components due to its more compact size.
There are four relatively big players in the FPGA market:
FPGAs are usually compared to are microcontrollers, and they fulfill different niches.
Microcontrollers uses a CPU architecture for abstraction in general-purpose computing, i.e. they come with a predefined instruction set as well as pinouts. These operate serially by design. FPGAs in contrast involves configuring logical circuits which can be arbitrarily programmed. These can operate in parallel and thus work nicely for higher rates, but can quickly run into physical limitations with larger circuits.
FPGAs are also flexible when it comes to deciding the tradeoff between speed and cost until very late in the design cycle - parallelism typically involve larger areas for speed, while a serial design trades speed for a more compact implementation.
More broadly, FPGA belong to a class of programmable digital circuits called Programmable Logic Device (PLD), which range from:
These chips can also be categorised by the number of transistors... but nuff said.
Diagrams for each architecture
The following writeup gives a pretty good high-level overview.
This is as opposed to ASICs which cannot be updated after manufacturing, see Intel's FDIV bug.
The primary components of an FPGA are:
On top of these, typically more than 80% of the chip is dedicated to the auxiliary circuits for programming said logic blocks as well as the programmable interconnects between these blocks.
Internal implementations of FPGAs are highly vendor-specific, with different models also carrying specialized primitives for specific use cases. We discuss these in a generic manner.
Logic blocks fundamentally are implemented as a series of look-up tables (LUTs). A two-input LUT (LUT2) can simulate any logic gate, noting that the output gate is also configurable (Xilinx and Altera FPGAs typically have one output gate per LUT). These are implemented using a combination of SRAM bits (holding the LUT-mask in configuration memory, CRAM) and a multiplexer (MUX) - in the LUT3 below, the $$a$$, $$b$$, and $$c$$ specify the input to the LUT, with an LUT-mask of 01000111
specifying the output for each corresponding input to the LUT3:
LUTs can be further composed using smaller LUTs and a MUX - below on the left shows a configuration for either two independent LUT5s, or one LUT6. The diagram on the right combines the two concepts:
More complex LUTs can be designed to optimize for area/speed, e.g. Altera's Adaptive Logic Module (ALM) that provides an adaptive combinations of LUT3s and LUT4s to share between two LUT5-equivalent logic functions, with additional registers and adders:
LUTs can be connected to flip-flops, together termed as a slice. Flip-flops allow signals to be held for more than a clock-cycle. There are typically different types of slices in an FPGA. Tying them all together is the routing switching matrix to interconnect different Complex Logic Blocks (CLBs) consisting of multiple slices each (for Xilinx FPGAs, SLICEM, SLICEL, and SLICEX that support different functionality for better performance in specialized cases such as arithmetic):
Each CLB contains the LUT, whose output is fanned either directly to the output or into the register, which is selected by the output switch.
Typically conducted in a series of steps:
Synthesis and place and route tools are typically vendor-specific, though there are also open-source varieties for Lattice (yosys nextpnr) and Xilinx (Project X-Ray), as well as officially-endorsed netlist generation tools (e.g. RapidWright for Xilinx).
This article goes into much further detail on the testing and simulation stages.
Two strategies: Register Transfer Level (RTL) or Gate Level modelling. In RTL, behaviours of components are modeled, e.g. what to do when a rising clock edge is detected. In GL, specific components (e.g. NOT or D flip-flops) are specified and directly connected.