Log

2022-12-06 Tuesday

Summary of addressing modes and corresponding instructions

The converse to the LPM instruction is the SPM instruction, which in its basic form writes contents of the word stored in R1:R0 to the address specified by the Z register. Looking up the description of SPM in the instruction set is mildly confusing, with some terminology needing to be demystified:

The use of LPM and SPM is considered "indirect addressing", i.e. using addresses stored in X-, Y- and Z-registers to address the corresponding location in program space. Access to data space (SRAM) on the other hand is facilitated using LDS and STS.

Available instructions are device-dependent, which can only be looked up via the instruction set summary in the corresponding device datasheet. See for example the subset of data transfer instructions for ATmega16:

Some larger devices contain an additional 8-bit RAMPZ register that allows the ELPM (extended LPM) instruction (if implemented) to address a total of 24-bits. This concatenation is denoted (RAMPZ:Z).
Specific SPM usage is documented in the device datasheet. In some instances, it can also set bootloader lock bits. The full instruction description is attached below (truncating the examples):

Tying it all together now. Recall the memories of the ATmega16, which consists of:

Flash program memory of size 16,000 bytes with 10,000 write/erase endurance, separated into distinct Bootloader and Application memory spaces for software security.
SRAM data memory of size 1,120 bytes with volatility, separated into 32 byte general purpose registers, 64 I/O registers, and 1,024 bytes of internal data SRAM.
EEPROM data memory of size 512 bytes with 100,000 write/erase endurance. Access is negotiated using the EEPROM control registers (EECR), either via manual read/write enable, or using interrupts, while the EEPROM address and data are fitted into the corresponding EEARH/EEARL and EEDR registers.

The complete description of how data and program space can be accessed is thus summarized below (originally found in datasheet but sorted by alphabetical order):

Program	Effect	Remarks
LPM	Load program memory to specified register	Address stored in Z (ZH:ZL)
SPM	Store to program memory contents of R1:R0	Address stored in Z (ZH:ZL)
General	Effect	Remarks
MOV	Move byte between registers	-
MOVW	Move word between register pairs or two-byte registers	-
LDI	Write value into register	-
I/O	Effect	Remarks
IN (OUT)	Read (write) I/O register values into (from) specified register	-
SBI (CBI)	Set (clear) bit value in I/O register	-
Data	Effect	Remarks
LD (ST)	Load (Store to) SRAM memory to (from) specified registers	Address stored in X, Y, or Z
LDS (STS)	Load (Store to) SRAM memory to (from) specified registers	Address specified in two-byte value
PUSH (POP)	Store to (Load from) SRAM stack memory the contents of specified register	-

Strings

Strings (as per C convention) are null-terminated, so we can define the following. Note that program memory is organized in words, so the assembler will complain (and automatically pad a zero byte) if the total length including the null-terminator is not an even number of bytes.

.equ	NULL	= 0x00
...
TXTSTR:
.db	"Hello world!",NULL

In the UART receive interrupt vector, the character is read and checked if it matches a certain signature (if-else), and gets forwarded to either SEND_CHAR or SEND_MESSAGE:

UART_RXC:
	in	CHAR,UDR
	cpi	CHAR,'m'	; compare char to 'm', sets Z-flag if true
	brne	SEND_CHAR
	rcall	SEND_MESSAGE
	rjmp	FINISH
SEND_CHAR:
	inc	CHAR
	rcall	RS_SEND
FINISH:
	reti

If SEND_MESSAGE is invoked, the address at TXTSTR is loaded into Z, then each character is repeatedly loaded and sent until the null-termination is reached.

SEND_MESSAGE:
	ldi	ZH,HIGH(TXTSTR*2)
	ldi	ZL,LOW(TXTSTR*2)
WRITE:	lpm	CHAR,Z+
	tst	CHAR		; check if null-terminated, i.e. cpi CHAR,0
	breq	WRITE_FINISH
	rcall	RS_SEND
	rjmp	WRITE
WRITE_FINISH:
	ret

For posterity, the code in RS_SEND to transmit data over USART:

RS_SEND:
	sbi	UCSRB,TXEN
	sbis	UCSRA,UDRE
	rjmp	RS_SEND
 
	out	UDR,CHAR
	cbi	UCSRB,TXEN
	ret

With this, I'm finally done with the assembly section of the Kanda AVR tutorials for ATmega16. Next up is the C equivalents, which is more up my alley. To consider reading up on integrating assembly with C code using avr-gcc with assembler-with-compiler option.

2022-11-27 Sunday

Data indirect addressing and data tables (.DB, LPM)

This time working with the .DB directive, see description of it - essentially storing constants in flash memory. Since flash memory is in two-byte words, the .DB directive will pad constants up to two-byte word boundaries. This program space can be addressed using the LPM (load program memory) instruction.

16 bytes below, so we declare a constant:

.equ	_TABLE_SIZE_B	= 16
...
VALUES:
.db	0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80
.db	0x80, 0x40, 0x20, 0x10, 0x08, 0x04, 0x02, 0x01

Same initialization for Timer0, but here we additionally define loop iterators:

INIT:	; Initialize table loop iterators
	ldi	INDEX,0
	ldi	TABLE_SIZE,_TABLE_SIZE	; store value in register for comparison
 
MAIN:
LOOP:	; AVR instructions require byte addressing, but assembler processes code in words
	; Using Z register which can store 16-bit words, and load address of the table
	ldi	ZH,high(2*VALUES)
	ldi	ZL,low(2*VALUES)
 
	; Adds iterator value to table address
	; Equivalent of: adiw ZH:ZL,INDEX; although it doesn't seem to compile due to INDEX
	add	ZL,INDEX
	adc	ZH,TEMP
 
	; Load Program Memory (LPM) loads one byte pointed to by Z register, which stores
	; the byte address. Since program memory is organized in 16-bit words, the
	; least significant bit determines whether the low or high byte is selected.
	; If no register specified, assume to be written to R0, otherwise syntax is
	; 'lpm r16,Z' or 'lpm r16,Z+' (for post-increment).
	lpm
	mov	TEMP,r0
	com	TEMP		; inversion for LED
	out	PORTB,TEMP
	rcall	DELAY
 
	; Loop through the entire table
	inc	INDEX
	cp	INDEX,TABLE_SIZE
	brne	LOOP
 
	; Reset index and repeat program again
	clr	INDEX
	rjmp	MAIN

2022-11-08 Tuesday

Important to note that datasheets are designed to be complete, i.e. all features are dumped in one-go to function as a reference. But first reads don't need all that information - instead how it works and design decisions for the particular features are more important.

It is thus critical to phrase notes that has a natural progression of ideas and features.

2022-11-05 Saturday

USART

Continuing from where we left off from the USART topic, this time digging a little deeper into the datasheet.

Actually, the topic is a little large that it entails a separate description in its own page, located here.

2022-09-09 Friday

Commercial devices have a clock recovery chip that can introduce artifacts in phase measurement.

2022-09-08 Thursday

USART

This topic is a little tricky to understand - it helps to first understand what a serial port is as well as their communication protocols.

Serial port: A serial communication interface between devices. Used in place of modern USB standards where compatibility and simplicity is needed.
- Serial: One bit is sent at a time, contrasting with parallel port where multiple bits are sent simultaneously.
- Interfaces: Ethernet, USB, hardware compliance with RS-232, etc.

USART stands for "Universal Synchronous/Asynchronous serial Receiver and Transmitter"

Some constants defined in the assembly code, and it looks like the programmer takes cue from this as well:

.equ	F_CPU		= 8000000	; clock frequency external crystal 
.equ	BAUDRATE	= 19200		
.equ	BAUDCONST	= (F_CPU/(16*baudrate))-1

First need to initialize the serial port, and allow receiver to interrupt.

INIT:	; Setup serial port
	ldi	TEMP,BAUDCONST
	out	UBRRL,TEMP
	clr	TEMP
	out	UBRRH,TEMP
	sbi	UCSRB,RXCIE	; receiver interrupt enable bit
	sbi	UCSRB,RXEN	; receiver enable bit
	sei			; global interrupt enable bit

which brings us to the interrupt code when the receiver is receiving:

UART_RXC:
	in	CHAR,UDR	; read character
	rcall	RS_SEND		; resend command
	reti

This triggers the transmit code and also updates the LED attached to port B. Here the transmitter is enabled only when data transmission is triggered (TXEN), and disabled once done.

RS_SEND:
	sbi	UCSRB,TXEN	; transmit enable bit set
	sbis	UCSRA,UDRE	; wait until last character sent???
	rjmp	RS_SEND
 
	inc	CHAR		; increment character value to be bounced back
	out	UDR,CHAR	; transmit
	rcall	LED_DISPLAY	; set display
 
	cbi	UCSRB,TXEN	; transmit enable bit clear
	ret
 
LED_DISPLAY:
	com	CHAR		; invert for LEDs (active low)
	out	PORTB,CHAR
	ret

2022-09-04 Sunday

Simple single-channel ADC (ADMUX, ADCSRA, ADCL, ADCH)

The ADC is a 10-bit resolution register for single-ended channel reads (lower resolution if differential input with gain used). First prepare the immediate registers to read from the ADC store:

.def	ADCSTOREH	= r17	; 8-bits since left-aligned (ADLAR = 1, see below)
.def	ADCSTOREL	= r18	; 2-bits

We configure the ADC to have the following properties:

Voltage reference (for max value) set to AVCC
Left-aligned ADC values: high register of ADC store will contain the 8-bit MSB, and low register contains the last 2-bit LSB
Clock input into ADC clock prescaled by 64x
ADC is enabled

INIT:	...
 
	; Use 'ori' to perform in-place OR with immediate register
	ldi	TEMP,(1<<REFS0)|(1<<ADLAR)	; '01' -> AVCC reference, left-aligned
	ori	TEMP,ADCCHANNEL		; set MUX for single-ended channel
	out	ADMUX,TEMP
 
	; Initialize control register for ADC
	; Prescalar can be smaller for higher rate if lower resolution needed
	ldi	TEMP,(1<<ADPS2)|(1<<ADPS1)	; prescalar = 64 (125kHz for 8MHz clock)
	out	ADCSRA,TEMP
	sbi	ADCSRA,ADEN		; enable ADC

Voltage conversion can be triggered manually by setting the ADSC bit in the ADCSRA control register, with a conversion completion unsetting the ADSC bit. The ADC value itself is read from ADCL (first) and ADCH (last).

MAIN:	sbi	ADCSRA,ADSC		; trigger first conversion
LOOP:	sbic	ADCSRA,ADSC		; skip next instruction if bit cleared
	rjmp	LOOP
 
	; Conversion completed - read
	in	ADCSTOREL,ADCL		; read low byte first
	in	ADCSTOREH,ADCH		; read high byte last
 
        ...

Because of a lack of convenient jumper wire, I used the ADC7 channel with the ribbon cable as a substitute instead.

Understanding structure of a datasheet

Important to recognize that the microcontroller is typically designed in a modular fashion, hence the different sections for each specific features.

It helps to give the relevant section a quick read, which will provide the following information:

Rough overview of available capabilities (performance, features, warnings and pitfalls)
Relevant registers for configuration and interfacing (last section)
Instruction set

After this, scroll to the following sections for additional information:

Pin configuration section to check pin locations
Register summary section to see location of registers
Instruction set to see possible interfacing instructions

Question: Is it possible to separate application code from boilerplate in assembly files?

2022-08-28 Sunday

Read the datasheet Section 12 for I/O pins and Section 13 for external interrupts.

Some terminology to clear first:

A pull-up resistor brings the voltage on the output to be high (requires a component of sufficiently high impedance and the pull-up resistor to be of sufficiently low resistance). A PU of higher impedance is weak (slower to reach HIGH, but less current drawn during switching), while a PU of lower impedance is strong.
An output pin has low impedance to HIGH/LOW voltages, allowing it to act as a current source/sink. This contrasts with an input pin with high impedance, which allows the device to simply read the HIGH/LOW state of the pin.

A nice schematic from a set of ENGR40M lecture notes, showing output and input configurations:

An external switch only defines a connection. The pull-up (pull-down) resistor converts the connection state into a HIGH/LOW voltage state, by "pulling up" ("down") the input pin when the switch is not being driven low by a connection to ground.

Hardware timer with external interrupt (MCUCR, GICR)

Section 12 describes the required configuration for DDRx and PORTx registers. Note the tri-state represents the high impedance (high-Z) state, which helps to remove the device's influence on the circuit. Note also the extra pull-up disable (PUD) bit in the SFIOR register that allows for a 01 transition state when switching from 0b00 input to 0b11 output high, if the environment is not of sufficiently high impedance and this is a critical non-allowed state.

Since the LEDs on the STK-200 are connected to HIGH, we run it in output active-low with 0b11 as the initial state. The switch is set up in high-Z state, i.e. 0b00. The partial code below enables the input port as an interrupt

EXT_INT0:
	dec	COUNT		; decrement count register
	out	PORTB,COUNT	; send it to LEDs
	reti
 
INIT:	; Note: Ports are input (high-impedance) by default
	; Initialize external interrupt INT0
	ldi	TEMP,(1<<ISC01)		; Interrupt Sense Control - trigger on falling edge
	out	MCUCR,TEMP
	ldi	TEMP,(1<<INT0)		; enable INT0 interrupt
	out	GICR,TEMP
	sei				; enable interrupts globally

Fuses

Had some experience with this already, here's a recap:

BODEN and BODLEVEL for brown-out detection enable and level, default disabled
SUT1:0 for start-up time to allow capacitance to first discharge, value derived in conjunction with CKSEL, default 65 ms
CKSEL3:0 to select internal/external clocks, default internal 1 MHz
OCDEN enables on-chip debugging, default disabled
JTAGEN enables JTAG interface, default enabled
SPIEN enables SPI interface, default enabled
CKOPT sets whether full rail-to-rail oscillation signal is used, for noisy environments, default unprogrammed (disabled)
EESAVE sets whether EEPROM is preserved upon chip erase, e.g. for serials, default disabled
BOOTSZ1:0 sets the size of the boot block to store bootloader, default 1024 words at 0x1c00 to 0x1fff
BOOTRST sets whether reset vector points to application code or boot code, default application

According to datasheet, for an external crystal oscillator of 8 MHz, CKOPT = 1, CKSEL3:1 = 111, so setting CKSEL0 = 1 and SUT1:0 = 10 gives 16K CK startup time and 4.1ms delay after RESET.

2022-08-27 Saturday

With regards to styling for Assembly, pick a style and forget about it: it's all a matter of project convention. A preliminary style guide:

ASM mnemonics and directives in lowercase. Everything else in uppercase.
Tab size 8. Only labels in first column, only directives and instructions in second column (or comments), everything else in third column without spaces, comments last.
Comments to always begin with semicolon.

Hardware timer with Timer0 (TCCR0, TCNT0, TIFR, TOV0)

Implementing timer using the 8-bit Timer0, which is the simplest hardware timer to operate (Timer2 is also 8-bit, while Timer1 is 16-bit). Since we set a timer prescalar of 1024 for an internal clock of 1MHz:

$$$$ \text{T0 overflow rate} = \frac{10^6}{1024}\div{}256 \approx{} 3.81 \text{Hz} = \frac{1}{262 \text{ms}}$$$$

init:	ldi	TEMP,5
	out	TCCR0,TEMP		; set timer prescaler to 1024
	clr	TEMP
	out	TCNT0,TEMP		; clear timer0 count I/O register
 
main:	rcall	wait
	...
 
wait:	in	TEMP,TIFR		; read T0 flags into register
	sbrs	TEMP,TOV0		; test T0 overflow, skip next line if set
	rjmp	wait			; loop if not set
 
	cbr	TEMP,TOV0		; clear T0 overflow
	out	TIFR,TEMP		; restore T0 flag register
	ret

This is a hardware timer - T0 runs in the background and TOV0 flag will be automatically set when the timer overflows after incrementing with TCNT0 = 255.

Again, use a for-loop construct to repeat the instruction multiple times:

delay:	ldi	REPEAT,8		; FOR loop start
wait:	...
	dec	REPEAT
	brne	wait			; FOR loop end
	ret

Hardware timer with interrupt vector (TOIE0, TIMSK, sei, reti)

Instead of manually checking the overflow flag, we can rely on the program to automatically interrupt the program flow. Interrupt is triggered only if:

Interrupts are enabled (sei to set I-bit in status register SREG)
Timer0 interrupt is enabled (set TOIE0-bit in TIMSK)
An overflow occurred (TOV0 is set, will be automatically unset after interrupt triggered)

During an interrupt, the program counter is pushed to the stack, which is eventually popped off the stack when reti is called, and loaded back into the program counter.

The interrupt table can be found in the datasheet. Looks like this:

Because the program addresses words, but the .org directive is in bytes, the address for the Timer0 interrupt is 0x24 in bytes (doubled).

...
	; Termed the Interrupt Service Routine (ISR), or Interrupt Handler
	.org	0x24		; T0 OVF interrupt vector address
	in	TEMP,PINB	; read current value of PORTB (via PINB)
	com	TEMP		; invert one's complement, i.e. NOT
	out	PORTB,TEMP	; write to PORTB
	reti			; return from interrupt
 
INIT:   ...
	ldi	TEMP,5
	out	TCCR0,TEMP	; set timer prescaler to 1024
	clr	TEMP
	out	TCNT0,TEMP	; clear T0 count register
	ldi	TEMP,(1<<TOIE0)	; "T0 Overflow Interrupt Enable" is bit-0 in TIMSK
	out	TIMSK,TEMP	; write to "Timer Interrupt MaSK" register
	sei			; enable all interrupts by setting I-bit in SREG
 
MAIN:	rjmp	MAIN		; Loop forever

Hardware timer with Timer1 (TCNT1, TCCR1, OCR1A)

Since a JMP instruction is a word long, we can quickly assemble an interrupt table as the boilerplate code:

.org		0
	jmp	INIT
	jmp	EXT_INT0	; External 0 interrupt vector
	jmp	EXT_INT1	; External 1 interrupt vector
	jmp	TIM2_COMP	; Timer 2 Compare interrupt vector
	jmp	TIM2_OVF	; Timer 2 Overflow interrupt vector
	jmp	TIM1_CAPT	; Timer 1 Capture interrupt vector
	jmp	TIM1_COMPA	; Timer 1 CompareA interrupt vector
	jmp	TIM1_COMPB	; Timer 1 CompareB interrupt vector
	jmp	TIM1_OVF	; Timer 1 Overflow interrupt vector
	jmp	TIM0_OVF	; Timer 0 Overflow interrupt vector
	jmp	SPI_HANDLE	; SPI Transmit interrupt vector
	jmp	UART_RXC	; UART RX Complete interrupt vector
	jmp	UART_DRE	; UDR Empty interrupt vector
	jmp	UART_TXC	; UART TX Complete interrupt vector
	jmp	ADC_COMP	; ADC Conversion Complete interrupt vector
	jmp	EE_RDY		; EEPROM Ready interrupt vector
	jmp	ANA_COMP	; Analogue Comparator interrupt vector
	jmp	TWI		; TWI interrupt vector
	jmp	EXT_INT2	; External 2 interrupt vector
	jmp	TIMER0_COMP	; Timer 0 Compare Match vector
	jmp	EE_RDY		; EEPROM Ready interrupt vector
	jmp	SPM_RDY		; Store Program Memory Ready interrupt vector
 
EXT_INT0:	reti
EXT_INT1:	reti
TIM2_COMP:	reti
TIM2_OVF:	reti
TIM1_CAPT:	reti
TIM1_COMPA:
	in	TEMP,PINB
	com	TEMP
	out	PORTB,TEMP
	reti
 
TIM1_COMPB:	reti
TIM1_OVF:	reti
TIM0_OVF:	reti
SPI_HANDLE:	reti
UART_RXC:	reti
UART_DRE:	reti
UART_TXC:	reti
ADC_COMP:	reti
EE_RDY:		reti
ANA_COMP:	reti
TWI:		reti
EXT_INT2:	reti
TIMER0_COMP:	reti
SPM_RDY:	reti
 
INIT:   ...

Running Timer1 is not that much more involved compared to Timer0. However, since the counter is 16-bit, waiting for the timer to overflow is not particularly feasible, so we load a max counter value TMAX into the output compare OCR1A, and enable the interrupt vector for when T1 counter hits this value and clears timer on compare (CTC mode).

.equ	F_CPU	= 1000000
.equ	seconds	= 1
.equ	TMAX	= (F_CPU/1024) * seconds
 
...
INIT:	...
	; Initialize Timer1
	clr	TEMP
	out	TCNT1H,TEMP		; clear Timer1 counter
	out	TCNT1L,TEMP
	out	TCCR1A,TEMP		; disable output compare for Timer1
	ldi	TEMP,0x0d		; set CTC mode with OCR1A source and 1024 prescaler 
	out	TCCR1B,TEMP		; - read WGM13:0 table and CS12:0 table for details
	ldi	TEMP,high(TMAX)		; write MAX value to OCR1A
	out	OCR1AH,TEMP
	ldi	TEMP,low(TMAX)
	out	OCR1AL,TEMP
	ldi	TEMP,(1<<OCIE1A)	; enable Timer1 interrupt via output compare A
	out	TIMSK,TEMP
 
	; Initialize ADC
	ldi	TEMP,(1<<ACD)		; disable ADC to save power
	out	ACSR,TEMP
	sei

Important note: 16-bit registers must be read low byte first, and write high byte first. The corresponding read/write instruction for the other byte triggers a 16-bit read/write of the register in a *single* clock cycle, using an internal 8-bit temporary register to hold the high byte.

2022-08-26 Friday

Calculate exact cycles for timer from specified clock speed

Starting a quick progress log instead, since will be easier to track rather than write full blown tutorials.

A more precise timing definition, although still plagued with a couple problems:

Actual timing delay behaviour seems to visibly lag within a minute. Shouldn't have been due to the small additional instructions, since the chip is running on 1MHz clock. Strange.
brne itself can take only 3 cycles on the last call when no branch is initiated.

.equ	F_CPU=1000000			; defines target clock speed in Hz
.equ	CYCLES_PER_MS=(F_CPU/4000)	; extra factor 4 since each loop is 4 cycles
...
 
MAIN:	ldi	seconds,2
	rcall	DELAYSEC
	...
 
; Precise timing subroutines
; Works by decrementing a word
DELAYMS:
	ldi	XH,HIGH(CYCLES_PER_MS)
	ldi	XL,LOW(CYCLES_PER_MS)
_ONEMS:
	sbiw	XL,1		; in-place subtraction of word value, 2 cycles
	brne	_ONEMS		; 2 cycles if branch, otherwise 1 cycle
	dec	mseconds	; Repeat for set number of milliseconds
	brne	DELAYMS
	ret
 
DELAYSEC:
	ldi	temp,4
_ONESEC:
	ldi	mseconds,250	; one byte can only hold up to 255 in value
	rcall	DELAYMS
	dec	temp		; 4 x 250ms = 1 second
	brne	_ONESEC
	dec	seconds		; Repeat for set number of seconds
	brne	DELAYSEC
	ret

Table of Contents

Log

2022-12-06 Tuesday

2022-11-27 Sunday

2022-11-08 Tuesday

2022-11-05 Saturday

2022-09-09 Friday

2022-09-08 Thursday

2022-09-04 Sunday

Understanding structure of a datasheet

2022-08-28 Sunday

2022-08-27 Saturday

2022-08-26 Friday