Taking cues from the LCMXO2-7000HE-B-EVN board (TQFP-144 form factor), the design included an FTDI FT2232HL USB to JTAG/RS232 chip for programming/interfacing over USB.
LCMXO2-7000HE-B-EVN schematics
A rough guide to what the MachXO2 chip can support:
The configuration flow1)
in particular is illuminating with respect to the use of PROGRAMN
, INITN
and DONE
lines. Note that INITN
and DONE
are bidirectional, open-drain2) I/O only when enabled. A summary of the flow:
INITN
is asserted active-low (i.e. open-drain pulled high when inactive) and DONE
is driven low.tINITL
time period has elapsed, (2) PROGRAMN
pin is deasserted, (3) INITN
is not asserted low by an external master, FPGA remains in an initialization state which clears SRAMINITN
signals an error in configurationPROGRAMN
input pin is asserted, (2) REFRESH
command received through one of FPGA's configuration ports, (3) power cycling. REFRESH
in particular is useful for hot-reloading of Flash memory.The Feature Row exists, which seem to behave similar to the functionality of MCU FUSES. The MachXO2 Programming and Configuration guide also specifies the following options for programming:
Description follows:
JTAG port is enabled by default, with the following configuration:
and finally, the benefits of using JTAG interface:
This article talks about using the JTAG interface for its original purpose, for boundary-scan during testing:
Question now is, how does one program over the JTAG interface? Seems like there is no well-defined protocol to achieve this, see Electronics SO. And this is indeed the case: the flash programming steps are defined in section 10 of the configuration datasheet, under "10.1 Flash Programming". File formats supported seem to be MachXO2-defined JEDEC file format, STAPL, SVF or a hex/binary bitstream.
This looks like it might be super useful for understanding how JTAG works.
Some miscellaneous software to install: fxload
Getting myself a head-start using existing FPGA repositories, by listing the process used there. Infrastructure entails the following connections for writing firmware to the FPGA via the microcontroller (MCU):
7C68013A | 1200HC |
---|---|
PA5/FIFOADR1 | PT10C/TDO |
PA4/FIFOADR0 | PT10D/TDI |
PA0/INT0 | PT11C/TCK |
PA1/INT1 | PT11D/TMS |
+3.3V | PT15C/JTAGENB |
For Atmel MCUs:
avr-gcc
: cross compiler to compile MCU firmwaredfu-programmer
: interface with MCU in DFU mode (available with in-built Atmel bootloader)jtagprog
: internal, firmware for MCU to act as FPGA programmer for FPGA (JTAGMODE to exit the initial boot CDC interface)usbprog
: internal, driver to recognize JTAG programmer interface of MCUdkms
to install driver, see SOjtagprg
: internal, interface with JTAG programmer to upload firmware to FPGA, parser for SVFAdditionally, for Cypress MCUs:
sdcc
: compiling firmware into Intel hex formatfxload
: loading firmware onto chipusbjtag
: internal, firmware for MCU to act as FPGA programmerSo far doesn't seem to be working:
Plain MCU does not respond, but eventually does after disconnecting other devices from hub
The assign
statement essentially drives a wire (which is initially unconnected) at the driven logical level. In a module, the input wires are assumed to be externally driven, which we then rely on to drive the output. Note that inputs to modules must be wires, and if input/output datatype is not specified they are assumed to be wires as well of one-bit width (if the wire is not declared at all, it is also assumed to be a 1-bit width wire - disable this behaviour with the directive `default_nettype none
):
module andGate( input a, // wire input b, // wire output c, // wire ) assign c = a & b; endmodule // or equivalently, in Verilog-1995 module andGate(a, b, c); input a, b; output c; assign c = a & b; endmodule
There is the concept of blocking =
and non-blocking <=
assignments in an always
block. This article explains non-blocking assignment nicely, using a twisted ring counter3), reproduced here. The intended behaviour of a twisted ring counter is achieved by using non-blocking assignment, which performs "assignment scheduling" - in other words the new value is not updated until the end of the cycle, and takes on the last assigned value:
// 2-bit twisted ring counter // { q_dff1, q_dff2 }: 10 -> 11 -> 01 -> 00 -> ... always @(posedge clock) begin // run on every positive clock edge q_dff1 <= ~q_dff2; // inversion assigned after cycle finishes q_dff2 <= q_dff1; end
When using blocking assignment, values are immediately assigned as the statements are reached (i.e. continuous assignment). This means the following two always blocks are equivalent:
always @(posedge clock) begin q_dff1 = ~q_dff2; // q_dff1 is immediately assigned the inverse of q_dff2... q_dff2 = q_dff1; // ...so q_dff2 is assigned ~q_dff2 end // equivalent circuit, which corresponds to a 1-bit twisted ring counter // { q_dff2 }: 1 -> 0 -> ... // q_dff1 is equivalent to q_dff2, so there is no need for a flip-flop always @(posedge clock) begin q_dff2 <= ~q_dff2; end assign q_dff1 = q_dff2;
Small caveat: blocking and non-blocking statements cannot be used for the same signal. Note also that with blocking assignments, one can have combinational design inside an always block as well. As to why it's called blocking and non-blocking, see this SO answer for the most illuminating explanation I've ever seen4).
There are also if - else if - else
constructs and case
constructs that are similar to that of C:
// 2-bit multiplexer with if-else always @(s) begin if (s == 2'b00) begin y = i0; z = ~z; end else if (s == 2'b01) y = i1; else if (s == 2'b10) y = i2; else y = i3; end // 2-bit multiplexer with case (and more efficient to boot, using a single LUT) always @(s) begin case (s) 2'b00 : begin y = i0; z = ~z; end 2'b01 : y = i1; // using binary 2 : y = i2; // using integer default : y = i3; // using default case endcase end
Loops are available as well, though mainly for use in testbenches (recall that the whole always block fully executes before values are updated, this means the synthesized loops will simply reflect the last assigned value after a single cycle). Use cases of for loops is typically to group different assignments on different outputs, see below:
// forever loop forever begin ... end // repeat loop (fixed number of times) repeat (5) begin ... end // for loop for (i = 1; i < 4; i = i + 1) begin shift[i] <= shift[i-1]; end // while loop i = 1; while (i < 4) begin shift[i] <= shift[i-1]; i = i + 1; end
A quick note that VHDL and Verilog can be interchanged, following the usual rules of port mapping when using VHDL modules in Verilog.
This two resources are pretty good at giving further insights on the design flow and abstraction layers.
A quick syntax overview of Verilog: there is the 1995 standard, and the 2001 standard. Described here will be the 2001-standard to fully capture most features (what is backward compatibility when the specification is >20 years old?). Here's a nice reference guide for Verilog-2001 by Sutherland HDL. In the style of LearnXinYminutes, assuming knowledge of C, and content adapted from ChipVerify:
// Single line comment /* Multi-line comment */ // Verilog is similar to C syntax: (1) case-sensitive, (2) ;-terminated. // Numbers denoted [size]'[format][number], where size is number of bits // The following are all the number decimal 10, underscores legal: 4'b1010 = 4'd10 = 4'10 = 4'o12 = 4'hA = 4'b1_010; // Default format is decimal, default size is datatype size, negation in front integer a = -10; // decimal 10 integer a = -'hA; // 32-bit integer by default // Values are 4-state, whose actual voltages model that of the hardware: // 0 represents logical zero / false // 1 represents logical one / true // X represents unknown logical value (not "don't care") // Z represents high impedance state, default if unconnected // Strings as per C-style // Identifiers must follow regex ^[A-Za-z_][A-Za-z0-9_$]*$ and not collide with keywords
Conventions for datatypes in timing diagrams and simulator waveforms are as follows:
Nets simply connect between hardware entities, and do not store values. Used to convey signal strength from connected components. A wire is the most commonly used type of net, representing an electrical wire:
wire my_wire1; wire [3:0] my_wire2; // 4-bits wide (4 wires), from bit 3 to bit 0 // Other net datatypes include: tri, wand, supply0, ...
Variables represent an abstraction of data storage element. Several options available:
reg [3:0] z = 3'b010; // general register, can be any size (default size 1) integer a = 109; // signed 32-bit number time b = 50ns; // unsigned 64-bit number (to represent time) real c = 40.25ps; // 64-bit float realtime d = 4.32; // 64-bit float (to represent time) // Strings can be stored, truncating or padding (with spaces) from the left: reg [8*11:0] str = "Hello world";
Multi-bit nets and variables are known as vectors (as opposed to single bit scalars). Reads/Writes similar to C-style arrays (and Python numpy array writes), with the notable difference from out-of-bounds vector accesses returning X
.
reg [3:0] addr = 4'b0Z00; addr [2:1] = 2'b1X; // addr is now 4'b01X0 addr [2]; // returns 1
Arrays can be declared for reg
, wire
, integer
and real
types. Array access is similar to C-style arrays. A 1D array of reg variables is referred to as memory:
reg y1 [3:0]; // scalar reg 1D array of depth 4, aka memory reg [7:0] y2 [3:0][1:0]; // 8-bit reg 2D array of rows 4 and columns 2
Operators follow a mix of C and Python styles:
Arithmetic: +, -, * /, %, ** (may not synthesize) Bitwise: ~, |, &, ^, ~& Relational: >, <, >=, <=, ==, != Logical: !, ||, && Conditional: ? : Shift: >>, << (logical shift) >>>, <<< (arithmetic shift, MSB filled) Sign-change: $unsigned(), $signed() Concatenation: {}, {{}}
The concatenation and replication operators combine smaller vectors into larger vectors:
wire [1:0] a = 2'b01; wire [2:0] b = 3'b0X1; wire [7:0] c; assign c = { 1'bZ, {2{a}}, b }; // c = 8'bZ01010X1 assign { c[7:6], c[5:0] } = { 1'bZ, {2{a}}, b }; // also valid // convention is little-endian, cannot swap endianness once declared wire [3:0] w = 4'b01XZ; // invalid: assign w = w[0:3]; // for vector reversal, use the concatenation instead assign w = { w[0], w[1], w[2], w[3] }; // alternatively with for loops, always @(*) begin for (integer i = 0; i < 4; i++) w[i] = w[3-i]; end // or using a generate-for loop, generate genvar i; for (i = 0; i < 4; i++) begin: block_name assign w[i] = w[3-i]; end endgenerate
localparam
, parameter
and defparam
are used to set constants (latter two for structural modeled code). Syntax for localparam is localparam N = 2, P = 3;
.
There are four modeling styles when writing Verilog code:
When using continuous assignment statements, instead of explicitly specifying the structure of the design, the relationship between signals are defined instead.
module comparator2Bit( input wire[1:0] a, b, output wire eq ); wire[3:0] s; assign s[0] = ~a[1] & ~a[0] & ~b[1] & ~b[0]; assign s[1] = ~a[1] & a[0] & ~b[1] & b[0]; assign s[2] = a[1] & ~a[0] & b[1] & ~b[0]; assign s[3] = a[1] & a[0] & b[1] & b[0]; assign eq = s[0] | s[1] | s[2] | s[3]; endmodule
Structural modeling uses composition to assemble larger designs.
module comparator1Bit( input wire x, y, output wire eq ); wire s0, s1; assign s0 = ~x & ~y; assign s1 = x & y; assign eq = s0 | s1; endmodule
module comparator2BitStruct( input wire[1:0] a, b, output wire eq ); wire s0, s1; comparator1Bit eq_bit0 (.x(a[0]), .y(b[0]), .eq(s0)); comparator1Bit eq_bit1 (.x(a[1]), .y(b[1]), .eq(s1)); assign eq = s0 & s1; endmodule
Using procedural statements involve nesting them within an 'always' block. These statements are run sequentially, so the order of the statements matter. This contrasts with the 'always connected' wires in the continuous assignment strategy where statements are executed in parallel.
module comparator2BitProcedure( input wire[1:0] a, b, output reg eq ); always @(a,b) // sensitivity list // block executes iff there are changes in 'a' and/or 'b' begin if (a[0]==b[0] && a[1]==b[1]) eq = 1; else eq = 0; end endmodule
Finally, a mixed design philosophy is also supported:
module comparator2BitMixed( input wire[1:0] a, b, output wire eq ); reg[1:0] s0, s1; always @(a,b) begin if (a[0]==b[0]) s0 = 1; else s0 = 0; end always @(a,b) begin if (a[1]==b[1]) s1 = 1; else s1 = 0; end assign eq = s0 & s1; endmodule
Digital design mainly one of two methods:
The following diagram summarizes pretty much everything:
While the complete design can be written using purely sequential statements, the resulting hardware may be too complex or may not even synthesize. Best is to construct small units then piece them together with structural modeling.
This looks like a good guide: VerilogGuide.
The vendor synthesis tool actually serves as a pretty useful tool for looking up different chip specification (and what they mean) too. See for example the MachXO2 chip we use, with part number LCMXO2-2000HC-6TG100I:
Still trying to identify how to even take the first step towards programming on an FPGA. How is Verilog related to FPGA, other than it just being the language for describing circuits?
Some things I can consider doing:
Some things I think I learnt:
The workflow for designing FPGAs is aptly described in asic-world tool page, and adapted here (highly recommend to give the page a read though - it has illustrative diagrams):
Stage | Description | Format | Tool |
---|---|---|---|
Specification | Specification of important parameters in system, e.g. a counter with 4-bit wide bus, have synchronous reset, etc. | Text | - |
High-level design | Splitting of design into blocks based on function, e.g. registers, ALU, instruction decode, etc. | Text + Waveform drawings | GTKWave (free) |
Low-level design | Description of how each block is implemented, e.g. state machines, counters, mux, decoders, internal registers. | ||
RTL coding | Writing code itself | HDL code | Covered (code coverage, free) |
Simulation | Verifying functional characteristics by writing a testbench which generates clk, reset and other required test vectors. Waveform output is validated. Timing simulation performed after synthesis and P&R for timing validation | Simulators | ModelSim (popular, commercial) VCS (fastest, commercial), Verilog-XL (the standard sign-off, commercial), Icarus (open-source) |
Synthesis | Conversion of RTL and constraints into target technology primitives, with formal verification to check if RTL to gate mapping is correct | Synthesizers | Xilinx Vivado (for Xilinx) Lattice Diamond (for Lattice) Synopsys Design Compiler (commercial) |
Place & Route | Netlist generated from synthesis is used to place gates and route the clock tree/reset (GDS output in the case of ASICs) | P&R tools | Typically use FPGA vendors' P&R tools. |
Post-Si validation | - | Chip | Testing with actual board and chip, require device drivers |
Side-note: Schematic generation is usually not a common thing to do in FPGA design, since it's more behavioural. For schematic generation, can either perform using Xilinx Vivado or yosys - these are synthesis tools since the HDL needs to be converted into gates, etc. See the following tutorial using yosys.
Consider reading EDA Playground's settings to see what options are available.
4'b1000
in a 4-bit ring counter, the sequence follows the cycle: 1000 -> 0100 -> 0010 -> 0001 -> ...In a twisted ring counter, a NOT gate is present at the output of the last flip-flop, i.e. given the same initial state of
4'b1000
in a 4-bit twisted ring counter, the sequence instead follows: 1000 -> 1100 -> 1110 -> 1111 -> 0111 -> 0011 -> 0001 -> 0000 -> ...