Log
2022-12-06 Tuesday
Summary of addressing modes and corresponding instructions
The converse to the LPM
instruction is the SPM
instruction, which in its basic form writes contents of the word stored in R1:R0 to the address specified by the Z register. Looking up the description of SPM in the instruction set is mildly confusing, with some terminology needing to be demystified:
- The use of LPM and SPM is considered "indirect addressing", i.e. using addresses stored in X-, Y- and Z-registers to address the corresponding location in program space. Access to data space (SRAM) on the other hand is facilitated using LDS and STS.
- Available instructions are device-dependent, which can only be looked up via the instruction set summary in the corresponding device datasheet. See for example the subset of data transfer instructions for ATmega16:
- Some larger devices contain an additional 8-bit RAMPZ register that allows the ELPM (extended LPM) instruction (if implemented) to address a total of 24-bits. This concatenation is denoted (RAMPZ:Z).
- Specific SPM usage is documented in the device datasheet. In some instances, it can also set bootloader lock bits. The full instruction description is attached below (truncating the examples):
Tying it all together now. Recall the memories of the ATmega16, which consists of:
- Flash program memory of size 16,000 bytes with 10,000 write/erase endurance, separated into distinct Bootloader and Application memory spaces for software security.
- SRAM data memory of size 1,120 bytes with volatility, separated into 32 byte general purpose registers, 64 I/O registers, and 1,024 bytes of internal data SRAM.
- EEPROM data memory of size 512 bytes with 100,000 write/erase endurance. Access is negotiated using the EEPROM control registers (EECR), either via manual read/write enable, or using interrupts, while the EEPROM address and data are fitted into the corresponding EEARH/EEARL and EEDR registers.
The complete description of how data and program space can be accessed is thus summarized below (originally found in datasheet but sorted by alphabetical order):
Program | Effect | Remarks |
---|---|---|
LPM | Load program memory to specified register | Address stored in Z (ZH:ZL) |
SPM | Store to program memory contents of R1:R0 | Address stored in Z (ZH:ZL) |
General | Effect | Remarks |
MOV | Move byte between registers | - |
MOVW | Move word between register pairs or two-byte registers | - |
LDI | Write value into register | - |
I/O | Effect | Remarks |
IN (OUT) | Read (write) I/O register values into (from) specified register | - |
SBI (CBI) | Set (clear) bit value in I/O register | - |
Data | Effect | Remarks |
LD (ST) | Load (Store to) SRAM memory to (from) specified registers | Address stored in X, Y, or Z |
LDS (STS) | Load (Store to) SRAM memory to (from) specified registers | Address specified in two-byte value |
PUSH (POP) | Store to (Load from) SRAM stack memory the contents of specified register | - |
Strings
Strings (as per C convention) are null-terminated, so we can define the following. Note that program memory is organized in words, so the assembler will complain (and automatically pad a zero byte) if the total length including the null-terminator is not an even number of bytes.
.equ NULL = 0x00 ... TXTSTR: .db "Hello world!",NULL
In the UART receive interrupt vector, the character is read and checked if it matches a certain signature (if-else), and gets forwarded to either SEND_CHAR
or SEND_MESSAGE
:
UART_RXC: in CHAR,UDR cpi CHAR,'m' ; compare char to 'm', sets Z-flag if true brne SEND_CHAR rcall SEND_MESSAGE rjmp FINISH SEND_CHAR: inc CHAR rcall RS_SEND FINISH: reti
If SEND_MESSAGE
is invoked, the address at TXTSTR
is loaded into Z, then each character is repeatedly loaded and sent until the null-termination is reached.
SEND_MESSAGE: ldi ZH,HIGH(TXTSTR*2) ldi ZL,LOW(TXTSTR*2) WRITE: lpm CHAR,Z+ tst CHAR ; check if null-terminated, i.e. cpi CHAR,0 breq WRITE_FINISH rcall RS_SEND rjmp WRITE WRITE_FINISH: ret
For posterity, the code in RS_SEND
to transmit data over USART:
RS_SEND: sbi UCSRB,TXEN sbis UCSRA,UDRE rjmp RS_SEND out UDR,CHAR cbi UCSRB,TXEN ret
With this, I'm finally done with the assembly section of the Kanda AVR tutorials for ATmega16. Next up is the C equivalents, which is more up my alley. To consider reading up on integrating assembly with C code using avr-gcc
with assembler-with-compiler
option.
2022-11-27 Sunday
Data indirect addressing and data tables (.DB, LPM)
This time working with the .DB
directive, see description of it - essentially storing constants in flash memory. Since flash memory is in two-byte words, the .DB
directive will pad constants up to two-byte word boundaries. This program space can be addressed using the LPM (load program memory) instruction.
16 bytes below, so we declare a constant:
.equ _TABLE_SIZE_B = 16 ... VALUES: .db 0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80 .db 0x80, 0x40, 0x20, 0x10, 0x08, 0x04, 0x02, 0x01
Same initialization for Timer0, but here we additionally define loop iterators:
INIT: ; Initialize table loop iterators ldi INDEX,0 ldi TABLE_SIZE,_TABLE_SIZE ; store value in register for comparison MAIN: LOOP: ; AVR instructions require byte addressing, but assembler processes code in words ; Using Z register which can store 16-bit words, and load address of the table ldi ZH,high(2*VALUES) ldi ZL,low(2*VALUES) ; Adds iterator value to table address ; Equivalent of: adiw ZH:ZL,INDEX; although it doesn't seem to compile due to INDEX add ZL,INDEX adc ZH,TEMP ; Load Program Memory (LPM) loads one byte pointed to by Z register, which stores ; the byte address. Since program memory is organized in 16-bit words, the ; least significant bit determines whether the low or high byte is selected. ; If no register specified, assume to be written to R0, otherwise syntax is ; 'lpm r16,Z' or 'lpm r16,Z+' (for post-increment). lpm mov TEMP,r0 com TEMP ; inversion for LED out PORTB,TEMP rcall DELAY ; Loop through the entire table inc INDEX cp INDEX,TABLE_SIZE brne LOOP ; Reset index and repeat program again clr INDEX rjmp MAIN
2022-11-08 Tuesday
Important to note that datasheets are designed to be complete, i.e. all features are dumped in one-go to function as a reference. But first reads don't need all that information - instead how it works and design decisions for the particular features are more important.
It is thus critical to phrase notes that has a natural progression of ideas and features.
2022-11-05 Saturday
USART
Continuing from where we left off from the USART topic, this time digging a little deeper into the datasheet.
Actually, the topic is a little large that it entails a separate description in its own page, located here.
2022-09-09 Friday
Commercial devices have a clock recovery chip that can introduce artifacts in phase measurement.
2022-09-08 Thursday
USART
This topic is a little tricky to understand - it helps to first understand what a serial port is as well as their communication protocols.
- Serial port: A serial communication interface between devices. Used in place of modern USB standards where compatibility and simplicity is needed.
- Serial: One bit is sent at a time, contrasting with parallel port where multiple bits are sent simultaneously.
- Interfaces: Ethernet, USB, hardware compliance with RS-232, etc.
USART stands for "Universal Synchronous/Asynchronous serial Receiver and Transmitter"
Some constants defined in the assembly code, and it looks like the programmer takes cue from this as well:
.equ F_CPU = 8000000 ; clock frequency external crystal .equ BAUDRATE = 19200 .equ BAUDCONST = (F_CPU/(16*baudrate))-1
First need to initialize the serial port, and allow receiver to interrupt.
INIT: ; Setup serial port ldi TEMP,BAUDCONST out UBRRL,TEMP clr TEMP out UBRRH,TEMP sbi UCSRB,RXCIE ; receiver interrupt enable bit sbi UCSRB,RXEN ; receiver enable bit sei ; global interrupt enable bit
which brings us to the interrupt code when the receiver is receiving:
UART_RXC: in CHAR,UDR ; read character rcall RS_SEND ; resend command reti
This triggers the transmit code and also updates the LED attached to port B. Here the transmitter is enabled only when data transmission is triggered (TXEN
), and disabled once done.
RS_SEND: sbi UCSRB,TXEN ; transmit enable bit set sbis UCSRA,UDRE ; wait until last character sent??? rjmp RS_SEND inc CHAR ; increment character value to be bounced back out UDR,CHAR ; transmit rcall LED_DISPLAY ; set display cbi UCSRB,TXEN ; transmit enable bit clear ret LED_DISPLAY: com CHAR ; invert for LEDs (active low) out PORTB,CHAR ret
2022-09-04 Sunday
Simple single-channel ADC (ADMUX, ADCSRA, ADCL, ADCH)
The ADC is a 10-bit resolution register for single-ended channel reads (lower resolution if differential input with gain used). First prepare the immediate registers to read from the ADC store:
.def ADCSTOREH = r17 ; 8-bits since left-aligned (ADLAR = 1, see below) .def ADCSTOREL = r18 ; 2-bits
We configure the ADC to have the following properties:
- Voltage reference (for max value) set to AVCC
- Left-aligned ADC values: high register of ADC store will contain the 8-bit MSB, and low register contains the last 2-bit LSB
- Clock input into ADC clock prescaled by 64x
- ADC is enabled
INIT: ... ; Use 'ori' to perform in-place OR with immediate register ldi TEMP,(1<<REFS0)|(1<<ADLAR) ; '01' -> AVCC reference, left-aligned ori TEMP,ADCCHANNEL ; set MUX for single-ended channel out ADMUX,TEMP ; Initialize control register for ADC ; Prescalar can be smaller for higher rate if lower resolution needed ldi TEMP,(1<<ADPS2)|(1<<ADPS1) ; prescalar = 64 (125kHz for 8MHz clock) out ADCSRA,TEMP sbi ADCSRA,ADEN ; enable ADC
Voltage conversion can be triggered manually by setting the ADSC bit in the ADCSRA control register, with a conversion completion unsetting the ADSC bit. The ADC value itself is read from ADCL (first) and ADCH (last).
MAIN: sbi ADCSRA,ADSC ; trigger first conversion LOOP: sbic ADCSRA,ADSC ; skip next instruction if bit cleared rjmp LOOP ; Conversion completed - read in ADCSTOREL,ADCL ; read low byte first in ADCSTOREH,ADCH ; read high byte last ...
Because of a lack of convenient jumper wire, I used the ADC7 channel with the ribbon cable as a substitute instead.
Understanding structure of a datasheet
Important to recognize that the microcontroller is typically designed in a modular fashion, hence the different sections for each specific features.
It helps to give the relevant section a quick read, which will provide the following information:
- Rough overview of available capabilities (performance, features, warnings and pitfalls)
- Relevant registers for configuration and interfacing (last section)
- Instruction set
After this, scroll to the following sections for additional information:
- Pin configuration section to check pin locations
- Register summary section to see location of registers
- Instruction set to see possible interfacing instructions
Question: Is it possible to separate application code from boilerplate in assembly files?
2022-08-28 Sunday
Read the datasheet Section 12 for I/O pins and Section 13 for external interrupts.
Some terminology to clear first:
- A pull-up resistor brings the voltage on the output to be high (requires a component of sufficiently high impedance and the pull-up resistor to be of sufficiently low resistance). A PU of higher impedance is weak (slower to reach HIGH, but less current drawn during switching), while a PU of lower impedance is strong.
- An output pin has low impedance to HIGH/LOW voltages, allowing it to act as a current source/sink. This contrasts with an input pin with high impedance, which allows the device to simply read the HIGH/LOW state of the pin.
A nice schematic from a set of ENGR40M lecture notes, showing output and input configurations:
An external switch only defines a connection. The pull-up (pull-down) resistor converts the connection state into a HIGH/LOW voltage state, by "pulling up" ("down") the input pin when the switch is not being driven low by a connection to ground.
Hardware timer with external interrupt (MCUCR, GICR)
Section 12 describes the required configuration for DDRx and PORTx registers. Note the tri-state represents the high impedance (high-Z) state, which helps to remove the device's influence on the circuit. Note also the extra pull-up disable (PUD) bit in the SFIOR register that allows for a 01 transition state when switching from 0b00 input to 0b11 output high, if the environment is not of sufficiently high impedance and this is a critical non-allowed state.
Since the LEDs on the STK-200 are connected to HIGH, we run it in output active-low with 0b11 as the initial state. The switch is set up in high-Z state, i.e. 0b00. The partial code below enables the input port as an interrupt
EXT_INT0: dec COUNT ; decrement count register out PORTB,COUNT ; send it to LEDs reti INIT: ; Note: Ports are input (high-impedance) by default ; Initialize external interrupt INT0 ldi TEMP,(1<<ISC01) ; Interrupt Sense Control - trigger on falling edge out MCUCR,TEMP ldi TEMP,(1<<INT0) ; enable INT0 interrupt out GICR,TEMP sei ; enable interrupts globally
Fuses
Had some experience with this already, here's a recap:
- BODEN and BODLEVEL for brown-out detection enable and level, default disabled
- SUT1:0 for start-up time to allow capacitance to first discharge, value derived in conjunction with CKSEL, default 65 ms
- CKSEL3:0 to select internal/external clocks, default internal 1 MHz
- OCDEN enables on-chip debugging, default disabled
- JTAGEN enables JTAG interface, default enabled
- SPIEN enables SPI interface, default enabled
- CKOPT sets whether full rail-to-rail oscillation signal is used, for noisy environments, default unprogrammed (disabled)
- EESAVE sets whether EEPROM is preserved upon chip erase, e.g. for serials, default disabled
- BOOTSZ1:0 sets the size of the boot block to store bootloader, default 1024 words at 0x1c00 to 0x1fff
- BOOTRST sets whether reset vector points to application code or boot code, default application
According to datasheet, for an external crystal oscillator of 8 MHz, CKOPT = 1, CKSEL3:1 = 111, so setting CKSEL0 = 1 and SUT1:0 = 10 gives 16K CK startup time and 4.1ms delay after RESET.
2022-08-27 Saturday
With regards to styling for Assembly, pick a style and forget about it: it's all a matter of project convention. A preliminary style guide:
- ASM mnemonics and directives in lowercase. Everything else in uppercase.
- Tab size 8. Only labels in first column, only directives and instructions in second column (or comments), everything else in third column without spaces, comments last.
- Comments to always begin with semicolon.
Hardware timer with Timer0 (TCCR0, TCNT0, TIFR, TOV0)
Implementing timer using the 8-bit Timer0, which is the simplest hardware timer to operate (Timer2 is also 8-bit, while Timer1 is 16-bit). Since we set a timer prescalar of 1024 for an internal clock of 1MHz:
$$$$ \text{T0 overflow rate} = \frac{10^6}{1024}\div{}256 \approx{} 3.81 \text{Hz} = \frac{1}{262 \text{ms}}$$$$
init: ldi TEMP,5 out TCCR0,TEMP ; set timer prescaler to 1024 clr TEMP out TCNT0,TEMP ; clear timer0 count I/O register main: rcall wait ... wait: in TEMP,TIFR ; read T0 flags into register sbrs TEMP,TOV0 ; test T0 overflow, skip next line if set rjmp wait ; loop if not set cbr TEMP,TOV0 ; clear T0 overflow out TIFR,TEMP ; restore T0 flag register ret
This is a hardware timer - T0 runs in the background and TOV0
flag will be automatically set when the timer overflows after incrementing with TCNT0 = 255.
Again, use a for-loop construct to repeat the instruction multiple times:
delay: ldi REPEAT,8 ; FOR loop start wait: ... dec REPEAT brne wait ; FOR loop end ret
Hardware timer with interrupt vector (TOIE0, TIMSK, sei, reti)
Instead of manually checking the overflow flag, we can rely on the program to automatically interrupt the program flow. Interrupt is triggered only if:
- Interrupts are enabled (
sei
to set I-bit in status register SREG) - Timer0 interrupt is enabled (set
TOIE0
-bit inTIMSK
) - An overflow occurred (
TOV0
is set, will be automatically unset after interrupt triggered)
During an interrupt, the program counter is pushed to the stack, which is eventually popped off the stack when reti
is called, and loaded back into the program counter.
The interrupt table can be found in the datasheet. Looks like this:
Because the program addresses words, but the .org
directive is in bytes, the address for the Timer0 interrupt is 0x24
in bytes (doubled).
... ; Termed the Interrupt Service Routine (ISR), or Interrupt Handler .org 0x24 ; T0 OVF interrupt vector address in TEMP,PINB ; read current value of PORTB (via PINB) com TEMP ; invert one's complement, i.e. NOT out PORTB,TEMP ; write to PORTB reti ; return from interrupt INIT: ... ldi TEMP,5 out TCCR0,TEMP ; set timer prescaler to 1024 clr TEMP out TCNT0,TEMP ; clear T0 count register ldi TEMP,(1<<TOIE0) ; "T0 Overflow Interrupt Enable" is bit-0 in TIMSK out TIMSK,TEMP ; write to "Timer Interrupt MaSK" register sei ; enable all interrupts by setting I-bit in SREG MAIN: rjmp MAIN ; Loop forever
Hardware timer with Timer1 (TCNT1, TCCR1, OCR1A)
Since a JMP instruction is a word long, we can quickly assemble an interrupt table as the boilerplate code:
.org 0 jmp INIT jmp EXT_INT0 ; External 0 interrupt vector jmp EXT_INT1 ; External 1 interrupt vector jmp TIM2_COMP ; Timer 2 Compare interrupt vector jmp TIM2_OVF ; Timer 2 Overflow interrupt vector jmp TIM1_CAPT ; Timer 1 Capture interrupt vector jmp TIM1_COMPA ; Timer 1 CompareA interrupt vector jmp TIM1_COMPB ; Timer 1 CompareB interrupt vector jmp TIM1_OVF ; Timer 1 Overflow interrupt vector jmp TIM0_OVF ; Timer 0 Overflow interrupt vector jmp SPI_HANDLE ; SPI Transmit interrupt vector jmp UART_RXC ; UART RX Complete interrupt vector jmp UART_DRE ; UDR Empty interrupt vector jmp UART_TXC ; UART TX Complete interrupt vector jmp ADC_COMP ; ADC Conversion Complete interrupt vector jmp EE_RDY ; EEPROM Ready interrupt vector jmp ANA_COMP ; Analogue Comparator interrupt vector jmp TWI ; TWI interrupt vector jmp EXT_INT2 ; External 2 interrupt vector jmp TIMER0_COMP ; Timer 0 Compare Match vector jmp EE_RDY ; EEPROM Ready interrupt vector jmp SPM_RDY ; Store Program Memory Ready interrupt vector EXT_INT0: reti EXT_INT1: reti TIM2_COMP: reti TIM2_OVF: reti TIM1_CAPT: reti TIM1_COMPA: in TEMP,PINB com TEMP out PORTB,TEMP reti TIM1_COMPB: reti TIM1_OVF: reti TIM0_OVF: reti SPI_HANDLE: reti UART_RXC: reti UART_DRE: reti UART_TXC: reti ADC_COMP: reti EE_RDY: reti ANA_COMP: reti TWI: reti EXT_INT2: reti TIMER0_COMP: reti SPM_RDY: reti INIT: ...
Running Timer1 is not that much more involved compared to Timer0. However, since the counter is 16-bit, waiting for the timer to overflow is not particularly feasible, so we load a max counter value TMAX into the output compare OCR1A, and enable the interrupt vector for when T1 counter hits this value and clears timer on compare (CTC mode).
.equ F_CPU = 1000000 .equ seconds = 1 .equ TMAX = (F_CPU/1024) * seconds ... INIT: ... ; Initialize Timer1 clr TEMP out TCNT1H,TEMP ; clear Timer1 counter out TCNT1L,TEMP out TCCR1A,TEMP ; disable output compare for Timer1 ldi TEMP,0x0d ; set CTC mode with OCR1A source and 1024 prescaler out TCCR1B,TEMP ; - read WGM13:0 table and CS12:0 table for details ldi TEMP,high(TMAX) ; write MAX value to OCR1A out OCR1AH,TEMP ldi TEMP,low(TMAX) out OCR1AL,TEMP ldi TEMP,(1<<OCIE1A) ; enable Timer1 interrupt via output compare A out TIMSK,TEMP ; Initialize ADC ldi TEMP,(1<<ACD) ; disable ADC to save power out ACSR,TEMP sei
Important note: 16-bit registers must be read low byte first, and write high byte first. The corresponding read/write instruction for the other byte triggers a 16-bit read/write of the register in a *single* clock cycle, using an internal 8-bit temporary register to hold the high byte.
2022-08-26 Friday
Calculate exact cycles for timer from specified clock speed
Starting a quick progress log instead, since will be easier to track rather than write full blown tutorials.
A more precise timing definition, although still plagued with a couple problems:
- Actual timing delay behaviour seems to visibly lag within a minute. Shouldn't have been due to the small additional instructions, since the chip is running on 1MHz clock. Strange.
brne
itself can take only 3 cycles on the last call when no branch is initiated.
.equ F_CPU=1000000 ; defines target clock speed in Hz .equ CYCLES_PER_MS=(F_CPU/4000) ; extra factor 4 since each loop is 4 cycles ... MAIN: ldi seconds,2 rcall DELAYSEC ... ; Precise timing subroutines ; Works by decrementing a word DELAYMS: ldi XH,HIGH(CYCLES_PER_MS) ldi XL,LOW(CYCLES_PER_MS) _ONEMS: sbiw XL,1 ; in-place subtraction of word value, 2 cycles brne _ONEMS ; 2 cycles if branch, otherwise 1 cycle dec mseconds ; Repeat for set number of milliseconds brne DELAYMS ret DELAYSEC: ldi temp,4 _ONESEC: ldi mseconds,250 ; one byte can only hold up to 255 in value rcall DELAYMS dec temp ; 4 x 250ms = 1 second brne _ONESEC dec seconds ; Repeat for set number of seconds brne DELAYSEC ret