TI II: Computer Architecture
ISA and Assembly

CISC vs. RISC
Data Types
Addressing
Instructions
Assembler
Content

1. Introduction
   - Single Processor Systems
   - Historical overview
   - Six-level computer architecture

2. Data representation and Computer arithmetic
   - Data and number representation
   - Basic arithmetic

3. Microarchitecture
   - Microprocessor architecture
   - Microprogramming
   - Pipelining

4. Instruction Set Architecture
   - CISC vs. RISC
   - Data types, Addressing, Instructions
   - Assembler

5. Memories
   - Hierarchy, Types
   - Physical & Virtual Memory
   - Segmentation & Paging
   - Caches
Where are we now?

Level 5  
Problem-oriented language level  
Translation (Compiler)

Level 4  
Assembly language level  
Translation (Assembler)

Level 3  
Operating system machine level  
Partial interpretation (operating system)

Level 2  
ISA (Instruction Set Architecture) level  
Interpretation (microprogram) or direct execution

Level 1  
Microarchitecture level  
Hardware

Level 0  
Digital logic level  
Hardware

Java, C#, C++, C, Haskell, Cobol, …
Javac, VS .NET
Java Byte Code, MSIL/CIL

Java Byte Code, MSIL/CIL
JVM, CLR; JIT/Interpreter
Unix, Windows, iOS

x86, PPC, ARM, …
JVM, CLR; JIT/Interpreter
microprogram/none

Netburst, ISSE, ASX, <none>, …
hardware
Core i7, ARM9, PPC620, …
Information

This chapter is mainly for some background information about ISAs, assembler etc.

The assignments plus tutorials teach you how to program with assembler

We skip the operation system part as this is covered in the course “Operating Systems and Computer Networks”
- Subsystems, Interrupts and System Calls
- Processes
- Memory
- Scheduling
- I/O and File System
- Booting, Services, and Security
The classical SW/HW boundary

- **FORTRAN 90 program** compiled to ISA program
- **C program** compiled to ISA program

ISA level

ISA program executed by microprogram or hardware

Hardware

Software

Hardware

What is visible to the user?
Execution of operations

Interpretation or compilation of instructions from the ISA-layer
- High-level languages translated into instructions from the ISA
- Pure RISC processors can directly execute ISA instructions
  - i.e. the hardware (HW) can execute these instructions
- More complex processors use **microprogramming**, flexibility and compatibility reasons:
  - hardware changes leave ISA unchanged
  - reprogramming to circumvent hardware problems
  - more powerful ISA – simpler compilers
## Example: A selection of the Pentium II integer instructions

<table>
<thead>
<tr>
<th><strong>Moves</strong></th>
<th><strong>Transfer of control</strong></th>
<th><strong>Condition codes</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>MOV DST, SRC</td>
<td>Jump to ADDR</td>
<td>STC</td>
</tr>
<tr>
<td>PUSH SRC</td>
<td>Conditional jumps based on flags</td>
<td>CLC</td>
</tr>
<tr>
<td>POP DST</td>
<td>Call procedure at ADDR</td>
<td>CMC</td>
</tr>
<tr>
<td>XCHG DS1, DS2</td>
<td>Return from procedure</td>
<td>STD</td>
</tr>
<tr>
<td>LEA DST, SRC</td>
<td>Return from interrupt</td>
<td>CLD</td>
</tr>
<tr>
<td>CMP DST, SRC</td>
<td>Loop until condition met</td>
<td>STI</td>
</tr>
<tr>
<td>ADD DST, SRC</td>
<td>Initiate a software interrupt</td>
<td>CLI</td>
</tr>
<tr>
<td>SUB DST, SRC</td>
<td>Unconditional jump based on flags</td>
<td>PUSHFD</td>
</tr>
<tr>
<td>MUL SRC</td>
<td>CALL ADDR</td>
<td>POPFD</td>
</tr>
<tr>
<td>IMUL SRC</td>
<td>Jxx ADDR</td>
<td>LAHF</td>
</tr>
<tr>
<td>DIV SRC</td>
<td>Loop until condition met</td>
<td>SAHF</td>
</tr>
<tr>
<td>IDIV SRC</td>
<td>ADD DST, SRC</td>
<td>SWAP DST</td>
</tr>
<tr>
<td>ADC DST, SRC</td>
<td>Subtract DST from SRC</td>
<td>CWQ</td>
</tr>
<tr>
<td>SBB DST, SRC</td>
<td>MUL SRC</td>
<td>CWDE</td>
</tr>
<tr>
<td>INC DST</td>
<td>Divide EDX:EAX by SRC (unsigned)</td>
<td>ENTER SIZE, LV</td>
</tr>
<tr>
<td>DEC DST</td>
<td>Divide EDX:EAX by SRC (signed)</td>
<td>LEAVE</td>
</tr>
<tr>
<td>NEG DST</td>
<td>Divide EDX:EAX by SRC (unsigned)</td>
<td>NOP</td>
</tr>
<tr>
<td>ADD DST, SRC</td>
<td>Add SRC to DST</td>
<td>HLT</td>
</tr>
<tr>
<td>SUB DST, SRC</td>
<td>Subtract DST from SRC</td>
<td>IN AL, PORT</td>
</tr>
<tr>
<td>MUL SRC</td>
<td>Multiply EAX by SRC (unsigned)</td>
<td>OUT PORT, AL</td>
</tr>
<tr>
<td>IMUL SRC</td>
<td>Divide EDX:EAX by SRC (signed)</td>
<td>WAIT</td>
</tr>
<tr>
<td>DIV SRC</td>
<td>Divide EDX:EAX by SRC (unsigned)</td>
<td>SRC = source</td>
</tr>
<tr>
<td>IDIV SRC</td>
<td>Divide EDX:EAX by SRC (signed)</td>
<td>DST = destination</td>
</tr>
<tr>
<td>ADC DST, SRC</td>
<td>Divide EDX:EAX by SRC (unsigned)</td>
<td></td>
</tr>
<tr>
<td>SBB DST, SRC</td>
<td>Subtract DST &amp; carry from SRC</td>
<td></td>
</tr>
<tr>
<td>INC DST</td>
<td>Add 1 to DST</td>
<td></td>
</tr>
<tr>
<td>DEC DST</td>
<td>Subtract 1 from DST</td>
<td></td>
</tr>
<tr>
<td>NEG DST</td>
<td>Negate DST (subtract it from 0)</td>
<td></td>
</tr>
</tbody>
</table>

### Arithmetic

<table>
<thead>
<tr>
<th><strong>Moves</strong></th>
<th><strong>Transfer of control</strong></th>
<th><strong>Condition codes</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>ADD DST, SRC</td>
<td>AND DST, SRC</td>
<td>AND DST, SRC</td>
</tr>
<tr>
<td>SUB DST, SRC</td>
<td>OR DST, SRC</td>
<td>OR DST, SRC</td>
</tr>
<tr>
<td>MUL SRC</td>
<td>XOR DST, SRC</td>
<td>XOR DST, SRC</td>
</tr>
<tr>
<td>IMUL SRC</td>
<td>Shift rotate # bits</td>
<td></td>
</tr>
<tr>
<td>DIV SRC</td>
<td>Logical shift # bits</td>
<td></td>
</tr>
<tr>
<td>IDIV SRC</td>
<td>Rotate # bits</td>
<td></td>
</tr>
<tr>
<td>ADC DST, SRC</td>
<td>Rotate through carry # bits</td>
<td></td>
</tr>
<tr>
<td>SBB DST, SRC</td>
<td></td>
<td></td>
</tr>
<tr>
<td>INC DST</td>
<td></td>
<td></td>
</tr>
<tr>
<td>DEC DST</td>
<td></td>
<td></td>
</tr>
<tr>
<td>NEG DST</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

### Boolean

<table>
<thead>
<tr>
<th><strong>Moves</strong></th>
<th><strong>Transfer of control</strong></th>
<th><strong>Condition codes</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>ADD DST, SRC</td>
<td>AND DST, SRC</td>
<td>AND DST, SRC</td>
</tr>
<tr>
<td>SUB DST, SRC</td>
<td>OR DST, SRC</td>
<td>OR DST, SRC</td>
</tr>
<tr>
<td>MUL SRC</td>
<td>XOR DST, SRC</td>
<td>XOR DST, SRC</td>
</tr>
<tr>
<td>IMUL SRC</td>
<td>Shift rotate # bits</td>
<td></td>
</tr>
<tr>
<td>DIV SRC</td>
<td>Logical shift # bits</td>
<td></td>
</tr>
<tr>
<td>IDIV SRC</td>
<td>Rotate # bits</td>
<td></td>
</tr>
<tr>
<td>ADC DST, SRC</td>
<td>Rotate through carry # bits</td>
<td></td>
</tr>
<tr>
<td>SBB DST, SRC</td>
<td></td>
<td></td>
</tr>
<tr>
<td>INC DST</td>
<td></td>
<td></td>
</tr>
<tr>
<td>DEC DST</td>
<td></td>
<td></td>
</tr>
<tr>
<td>NEG DST</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

### Shift/rotate

<table>
<thead>
<tr>
<th><strong>Moves</strong></th>
<th><strong>Transfer of control</strong></th>
<th><strong>Condition codes</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>ADD DST, SRC</td>
<td>AND DST, SRC</td>
<td>AND DST, SRC</td>
</tr>
<tr>
<td>SUB DST, SRC</td>
<td>OR DST, SRC</td>
<td>OR DST, SRC</td>
</tr>
<tr>
<td>MUL SRC</td>
<td>XOR DST, SRC</td>
<td>XOR DST, SRC</td>
</tr>
<tr>
<td>IMUL SRC</td>
<td>Shift rotate # bits</td>
<td></td>
</tr>
<tr>
<td>DIV SRC</td>
<td>Logical shift # bits</td>
<td></td>
</tr>
<tr>
<td>IDIV SRC</td>
<td>Rotate # bits</td>
<td></td>
</tr>
<tr>
<td>ADC DST, SRC</td>
<td>Rotate through carry # bits</td>
<td></td>
</tr>
<tr>
<td>SBB DST, SRC</td>
<td></td>
<td></td>
</tr>
<tr>
<td>INC DST</td>
<td></td>
<td></td>
</tr>
<tr>
<td>DEC DST</td>
<td></td>
<td></td>
</tr>
<tr>
<td>NEG DST</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

### Test/compare

<table>
<thead>
<tr>
<th><strong>Moves</strong></th>
<th><strong>Transfer of control</strong></th>
<th><strong>Condition codes</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>ADD DST, SRC</td>
<td>AND DST, SRC</td>
<td>AND DST, SRC</td>
</tr>
<tr>
<td>SUB DST, SRC</td>
<td>OR DST, SRC</td>
<td>OR DST, SRC</td>
</tr>
<tr>
<td>MUL SRC</td>
<td>XOR DST, SRC</td>
<td>XOR DST, SRC</td>
</tr>
<tr>
<td>IMUL SRC</td>
<td>Shift rotate # bits</td>
<td></td>
</tr>
<tr>
<td>DIV SRC</td>
<td>Logical shift # bits</td>
<td></td>
</tr>
<tr>
<td>IDIV SRC</td>
<td>Rotate # bits</td>
<td></td>
</tr>
<tr>
<td>ADC DST, SRC</td>
<td>Rotate through carry # bits</td>
<td></td>
</tr>
<tr>
<td>SBB DST, SRC</td>
<td></td>
<td></td>
</tr>
<tr>
<td>INC DST</td>
<td></td>
<td></td>
</tr>
<tr>
<td>DEC DST</td>
<td></td>
<td></td>
</tr>
<tr>
<td>NEG DST</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

### Strings

<table>
<thead>
<tr>
<th><strong>Moves</strong></th>
<th><strong>Transfer of control</strong></th>
<th><strong>Condition codes</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>ADD DST, SRC</td>
<td>AND DST, SRC</td>
<td>AND DST, SRC</td>
</tr>
<tr>
<td>SUB DST, SRC</td>
<td>OR DST, SRC</td>
<td>OR DST, SRC</td>
</tr>
<tr>
<td>MUL SRC</td>
<td>XOR DST, SRC</td>
<td>XOR DST, SRC</td>
</tr>
<tr>
<td>IMUL SRC</td>
<td>Shift rotate # bits</td>
<td></td>
</tr>
<tr>
<td>DIV SRC</td>
<td>Logical shift # bits</td>
<td></td>
</tr>
<tr>
<td>IDIV SRC</td>
<td>Rotate # bits</td>
<td></td>
</tr>
<tr>
<td>ADC DST, SRC</td>
<td>Rotate through carry # bits</td>
<td></td>
</tr>
<tr>
<td>SBB DST, SRC</td>
<td></td>
<td></td>
</tr>
<tr>
<td>INC DST</td>
<td></td>
<td></td>
</tr>
<tr>
<td>DEC DST</td>
<td></td>
<td></td>
</tr>
<tr>
<td>NEG DST</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

### Miscellaneous

<table>
<thead>
<tr>
<th><strong>Moves</strong></th>
<th><strong>Transfer of control</strong></th>
<th><strong>Condition codes</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>ADD DST, SRC</td>
<td>AND DST, SRC</td>
<td>AND DST, SRC</td>
</tr>
<tr>
<td>SUB DST, SRC</td>
<td>OR DST, SRC</td>
<td>OR DST, SRC</td>
</tr>
<tr>
<td>MUL SRC</td>
<td>XOR DST, SRC</td>
<td>XOR DST, SRC</td>
</tr>
<tr>
<td>IMUL SRC</td>
<td>Shift rotate # bits</td>
<td></td>
</tr>
<tr>
<td>DIV SRC</td>
<td>Logical shift # bits</td>
<td></td>
</tr>
<tr>
<td>IDIV SRC</td>
<td>Rotate # bits</td>
<td></td>
</tr>
<tr>
<td>ADC DST, SRC</td>
<td>Rotate through carry # bits</td>
<td></td>
</tr>
<tr>
<td>SBB DST, SRC</td>
<td></td>
<td></td>
</tr>
<tr>
<td>INC DST</td>
<td></td>
<td></td>
</tr>
<tr>
<td>DEC DST</td>
<td></td>
<td></td>
</tr>
<tr>
<td>NEG DST</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

---

**Note:**
- **SRC = source**
- **DST = destination**
- **LV = # locals**
- **# = shift/rotate count**
- **# bits**
Some call it “complete”…

COMPLEX INSTRUCTION SET COMPUTER (CISC)
Complex Instruction Set Computer (CISC) 1

Reasons for CISC
- Execution of complex instructions faster than execution of equivalent programs with the same function
- Micro programming allows for more complex instructions
- More complex instructions lead to shorter programs thus faster loading (transfer-rate gap between CPU internally and CPU-main memory)
- Bigger is better – more instructions sound more powerful...it’s marketing!
- Direct support of programming constructs of higher languages using more complex instructions (e.g. string compare)
- Support of specialized powerful compilers
- Compatibility (we can do everything like before plus xyz)
- Support of special purpose applications (e.g. matrix operations)

⇒ more transistors/chip, higher programming languages and special purpose applications favor “complex” instructions
Complex Instruction Set Computer (CISC) 2

Reasons against CISC
- Much faster main memories (argument of the 80’s, today again a problem) and the use of cache memory speed-up program execution
- Micro programs are more and more complex (so where is the difference between programming and micro programming…)
- Replacement of complex instructions using several simpler (much faster) instructions
- Longer development cycles
- Very complex control units
- Large micro programs with (potentially with errors)
- Real programs use only a small fraction of the large instruction set frequently!
CISC – really needed?

System programs in XPL on IBM/360:
- 90% of all instructions used: 10 different instructions
- 95% of all instructions used: 21 different instructions
- 99% of all instructions used: 30 different instructions

COBOL programs on IBM/370:
- 90% of all instructions used: 26 different instructions
- 99% of all instructions used: 48 different instructions
- Only 84 different instructions used at all
The 10 most used instructions in SPECint92 for Intel x86

<table>
<thead>
<tr>
<th>Instruction</th>
<th>Percentage [%]</th>
</tr>
</thead>
<tbody>
<tr>
<td>load</td>
<td>22</td>
</tr>
<tr>
<td>conditional branch</td>
<td>20</td>
</tr>
<tr>
<td>compare</td>
<td>16</td>
</tr>
<tr>
<td>store</td>
<td>12</td>
</tr>
<tr>
<td>add</td>
<td>8</td>
</tr>
<tr>
<td>and</td>
<td>6</td>
</tr>
<tr>
<td>sub</td>
<td>5</td>
</tr>
<tr>
<td>move register-register</td>
<td>4</td>
</tr>
<tr>
<td>call</td>
<td>1</td>
</tr>
<tr>
<td>return</td>
<td>1</td>
</tr>
<tr>
<td><strong>Total</strong></td>
<td><strong>95</strong></td>
</tr>
</tbody>
</table>
Limitations of CISC architectures

Usage of instructions (80/20 rule)
- Only 20% of the instructions used frequently
- Many powerful instructions (rarely used)
- Complex instruction format(s)
- Micro programming

Critical problem: number of cycles per instruction (CPI)
- Many classical CISC architectures have CPI >> 2
  - Motorola MC68030: CPI = 4-6
  - Intel 80386: CPI = 4-5
- BUT: optimized code for Pentium/Itanium/… – typical CPI ≈ 1
  - Superscalar processors e.g. issuing 4 instructions in parallel could theoretically go down to 0.25, but:
    floating-point, SIMD, branch mis-predictions, memory latency …
REduced Instruction Set Computer (RISC)
Reduced Instruction Set Computer (RISC)

The instruction set consists of
- a few, absolutely necessary instructions (≤ 128) and
- instruction formats (≤ 4) with a
- fixed instruction length of 32 bit and only some
- addressing modes (≤ 4).

This allows a much simpler implementation of the control unit and saves space on the chip for additional units.

Many general-purpose registers, at least 32, are needed.

Memory access is only possible via special load and store instructions.
Register/register architecture

Memory access is via load and store operations only.

All other instructions work on the CPU registers only, e.g., arithmetic operations load operands from registers and store results in registers only.

This basic principle is called
- register/register architecture or
- load/store architecture and is typical for many (original) RISC computers.
Additional features of RISC computers

If possible, all instructions should be implemented in a way that they finish within a single processor cycle.

Consequence: pure RISC processors do not use micro programming
- RISC processors introduced enhanced pipelining mechanisms (today, many processors use pipelining for the micro instructions, e.g., Pentium 4 and up).

Furthermore, the early RISC processors had a software-controlled pipeline (compilers inserted delay NOPs, introduced delayed jumps etc.) instead of special hardware.

Aside
- PC processors like the Pentium 4 (and up) use micro programming, the internal micro architecture (netburst) is rather RISC, the ISA is CISC.
RISC

Reasons for
- Single-chip implementation (yes, today “everything” fits on a single chip)
- Shorter development cycles
- Higher clock rates, pipelining
- Re-use of saved chip space for, e.g., cache

Reasons against
- Bottleneck in the memory interface, today again main memory is much slower compared to internal registers/cache
- Space on a chip is not that critical anymore
Early RISC processors

IBM 801 project
- Already 1975, Cocke, IBM research, Yorktown Heights

MIPS project
- Started 1981, Hennessy at the University of Stanford
- The first fully functioning chip was finished in 1983 (NMOS VLSI)
- This project was the starting point of the MIPS corporation.

Berkeley RISC project
- Started 1980, Patterson at UC Berkeley
- Origin of the SPARC processor
- Basic principle of overlapping register windows
- The instruction set contained only 31 instructions with a fixed length of 32 bit and only 2 instruction formats
- Only 3 addressing modes
RISC from today’s perspective

What is left from the early ideas of RISC (in many controllers, RISC processors, RISC cores), e.g.:
- Instruction pipelining
- Load/Store architecture
- Large register file, e.g.,
  - 32 general-purpose and
  - 32 floating point register
- A unified instruction format, e.g., 32 bit
- Few addressing modes
- No micro programming

  - SIMD/vector processing
  - Hypervisor/virtualization support
  - Different versions depending on use (embedded, 32/64/128 bit, …)
  - Compressed instructions
  - …
Differences between RISC and CISC processors 1

Pure RISC prefer the Harvard architecture
- Separate memory for instructions and data (operands) and, thus, two address and two data busses
  ➔ parallel fetching of instruction(s) and operand(s) possible

Simplified versions
1. Two separate bus systems up to the L1 caches, but only one main memory/unified L2/L3 cache (cheaper, standard with today’s systems)
2. Only a single, multiplexed bus system
Differences between RISC and CISC processors 2

Control unit
- Hard-wired
- Instruction register is a simple FIFO queue
- Each pipeline stage has its own register
- A simple combinational circuit can “interpret” the OpCodes in each stage directly

Register file
- Consists of a large number of (general purpose) registers
- Supports the simultaneous selection of several registers
  - E.g. 4 port register file: simultaneous write in R0, R1 and read from R2, R3
Differences between RISC and CISC processors 3

Execution unit
- Uses a load/store architecture, loads the operands in parallel via 2 operand busses from the register file and writes back the result within the same clock cycle into the register file.
- No direct connection between ALU and external bus, all data transfer done via load/store via the register file.

Exception
- Register bypass to avoid pipeline hazards (forwarding techniques)
The future of RISC?

Today
- Again, processors much faster than RAM/interconnection
- Frequent load/stores as bottleneck
- Integration of > 5 billion transistors on a single chip feasible

Thus
- Development of VLIW (Very Large Instruction Word) processors
- HP/Intel Itanium, very short pipeline, compiler does most of the work, powerful ISA (IA-64), less memory accesses
- Commercial Failure! But still some use: [https://en.wikipedia.org/wiki/Very_long_instruction_word](https://en.wikipedia.org/wiki/Very_long_instruction_word)

The future?
- RISC considered harmful? Not in embedded systems…
- Will legacy stay there forever…
  - seems so with x86-64 and similar…
- New opportunities with open architectures like RISC-V
Questions & Tasks

- Check the instruction set of current processors (Intel, ARM, AMD, …) – is it RISC or CISC?
- Which of the initial RISC ideas survived?
- There is a trade-off between RISC and CISC – when to favor RISC? Why using CISC?
Examples of ISAs

CISC, RISC, VM & ADDRESSING
Typical processors: Pentium, SPARC, JVM

The following processors serve as examples for different ISAs

Pentium, Core i3/5/7/9
- Originates from the classical x86 CISC architectures, today 64 bit
- Still CISC to the outside, but many RISC features inside
- Other CISC examples: Athlon, …, Ryzen, many classical ancestors (VAX, IBM, …)

UltraSPARC (Ultra Scalable Processor Architecture)
- Originates from the early RISC projects (like the MIPS processor)
- Still RISC, although extended in many ways
- Can be found in, e.g., SUN computers, industry control systems
- Other RISC examples: Alpha, MIPS, Power, PowerPC, RISC-V

JVM (Java Virtual Machine)
- Either seen as virtual processor or real HW (e.g., picoJava)
- Stack machine (operations take place on a stack)
- Heavily biased by Java
- Other virtual machine examples: CLR, P-Code, WebAssembly
Instruction formats

Example: $C := A + B$

a) Zero-address instruction
   - stack architectures: push $A$; push $B$; ADD; pop $C$

b) One-address instruction
   - Accumulator implicitly operand and result: load $A$; ADD $B$; st $C$

c) Two-address instruction
   - One operand becomes result: ADD $B,A$; move $A,C$

d) Three-address format
   - ADD $C,A,B$
Addressing modes

Addressing mode
- Different possibilities to calculate the address of an operand or the branch target address in the memory

Classical
- Address of operands or branch target address directly given in the instruction (absolute address)

Disadvantages
- Absolute addresses are fixed during programming and, thus, the compiled program determines its location in memory
- Accessing, e.g., dynamic tables require a change in the absolute address for each instruction – ROMs cannot be used as instruction memory!
Addressing modes

Solution
- Calculation of the address during runtime (dynamic address calculation)
Addressing modes – some examples

Be aware:
- Naming may differ depending on the architecture
- Not all processors support all modes

<table>
<thead>
<tr>
<th>Addressing mode</th>
<th>Instruction</th>
<th>Result</th>
</tr>
</thead>
<tbody>
<tr>
<td>Absolute/direct</td>
<td>jmpa address</td>
<td>PC := address</td>
</tr>
<tr>
<td>PC relative</td>
<td>jmpo offset</td>
<td>PC := PC’ + offset</td>
</tr>
<tr>
<td>Register indirect</td>
<td>jmpr R</td>
<td>PC := R</td>
</tr>
<tr>
<td>Sequential execution</td>
<td>nop</td>
<td>PC := PC’</td>
</tr>
<tr>
<td>Register (direct)</td>
<td>mul R1,R2,R3</td>
<td>PC := PC’; R1 := R2*R3</td>
</tr>
<tr>
<td>Base plus offset</td>
<td>load R1,R2,val</td>
<td>PC := PC’; R1 := mem(R2 + val)</td>
</tr>
<tr>
<td>Immediate</td>
<td>add R1,R2,val</td>
<td>PC := PC’; R1 := R2 + val</td>
</tr>
<tr>
<td>Implicit</td>
<td>load x</td>
<td>PC := PC’; accumulator := x</td>
</tr>
</tbody>
</table>

Indexed absolute, base plus index (plus offset), scaled, autoincrement/-decrement, …
- See [https://en.wikipedia.org/wiki/Addressing_mode](https://en.wikipedia.org/wiki/Addressing_mode)
Example: register indirect addressing

**Example:**

LD R1, (A0) \((load)\)

*(Load the register R1 with the content of the memory word the address register A0 points to)*
Questions & Tasks

- Several or more complex addressing modes vs. load/store architecture – advantages/disadvantages?
- What is the advantage of using a virtual machine such as JVM, CLR, etc.?
- What is the motivation behind relative / logical or virtual / physical addresses?
PROCEDURES, TRAPS, INTERRUPTS & CO.
Procedures, Traps, Interrupts & Co.

Many reasons for non-linear program execution
- Jumps, branches
- Procedure calls, subroutines, method invocation
- Multithreading, parallel processes, co-routines
- Hardware interrupts (processor external reasons)
- Traps, software interrupts (processor internal reasons)

Non linear program execution is the normal case!
- And invalidates standard cache content ...
  - Trace caches can help (more later)
Program execution

- Linear, without branches
- With branches
Procedure call (subroutine, method, ...)

(a) Calling procedure

A called from main program

CALLED

RETURN

(b) Called procedure

A returns to main program

CALL

RETURN

RETURN
Co-routine call
(parallel process, multithreading,...)
How to handle exceptions?

During runtime exceptions may occur, i.e., interruptions of the programmed flow of instructions

Reasons
- Errors in the operating system while executing application programs or errors in the hardware
- Requests of external components for attention of the processor
- …

Exceptional situations may require the interruption of the currently running program or even its termination

Are exceptions exceptional?
Exception handling

Handling of exceptions requires specialized routines (Interrupt Service Routine, ISR)

A specialized hardware component (interrupt system, interrupt controller) typically supports the selection and activation of an ISR

An ISR has the same structure as a subprogram, but there are also some differences
ISR vs. subprogram/subroutine

<table>
<thead>
<tr>
<th>Activity</th>
<th>Subroutine/subprogram</th>
<th>ISR</th>
</tr>
</thead>
<tbody>
<tr>
<td>Activation</td>
<td>call subroutine</td>
<td>INT instruction or hardware activation</td>
</tr>
<tr>
<td>Return after completion</td>
<td>RET instruction (return from subroutine)</td>
<td>RETI instruction (return from interrupt)</td>
</tr>
<tr>
<td>Calculation of starting address</td>
<td>Starting address of called subroutine written in calling program</td>
<td>Starting address of called ISR determined via interrupt table</td>
</tr>
<tr>
<td>Saving status</td>
<td>Subroutine call typically saves only PC on a stack</td>
<td>ISR calls save the PC and PSW on a stack</td>
</tr>
</tbody>
</table>
ISR vs. subprogram/subroutine

The processor always executes subroutine calls as programmed.

However, the processor executes ISR only if triggered and the Interrupt Enable bit in the PSW is set.

Reasons for exception handling
- External reasons (asynchronous events): incoming data, device ready, mouse movement, …
- Internal reasons (synchronous events): system calls, debugging, change of privilege, …
External reasons for exceptions

RESET
- Reset of the processor, e.g., triggered by a button, power supply, watch dog timer, …

HALT
- Stop the execution of the processor, e.g., to avoid access conflicts on the system bus during DMA (direct memory access)

ERROR
- Call of an error handler routine, e.g., due to bus errors

Interrupt
- Interrupt request triggered by an external device, e.g., to announce incoming data of an input device
- 2 types: maskable/non maskable (NMI)
Internal reasons for exceptions

Software Interrupts
- INT instruction in the program triggers an interrupt (system calls, debugging, …)

Traps
- Exceptions caused by internal events, e.g., overflow, division by zero, stack overflow, …
Example: Calculation of the start address of an Interrupt Service Routine (ISR)

ISR
Memory
Start Address
Interrupt Vector Table

Base Address Register

Interrupt Source
Int. Vector Number

INT
INTA
Scaling

Address Bus
Data Bus

Base Address

x 2, 4, ...

TI II - Computer Architecture
Typical steps of an ISR I

1. Interrupt activation
2. Finalize the instruction currently in execution
3. Check, if software interrupt or internal/external hardware interrupt
4. Check if Interrupt Enable bit is set
   ➔ allow interrupt
5. If it is a hardware interrupt: find source of interrupt, activate INTA (interrupt acknowledge)
6. Save PSW and PC on stack
7. Reset Interrupt Enable bit to avoid an additional interrupt in this stage
Typical steps of an ISR II

8. Calculate start address of ISR (e.g. based on the interrupt vector table) and load it into the PC
9. Execute the Interrupt Service Routine:
   - Push the used register on stack
   - Set the Interrupt Enable bit to allow other interrupts (i.e. interrupts can interrupt interrupts!)
   - Do the real work of the ISR
   - Pop the registers from stack
   - Return from interrupt handling using the IRET instruction
10. Restore PSW and PC and continue with the interrupted program

Be aware: if the ISR is too large it blocks the computer!
**Interrupt vector table**

Typically, located at a well-known address, e.g., in ROM (starting at address 0000:0000 for 80X86 processors)

Contains the start addresses of the ISRs

The source of an interrupt creates an interrupt number pointing at the entry in the interrupt vector table

Can be way more complex …
Examples of interrupt vector tables

<table>
<thead>
<tr>
<th>Interrupt Source</th>
<th>Interrupt Flag</th>
<th>System Interrupt</th>
<th>Word Address</th>
<th>Priority</th>
</tr>
</thead>
<tbody>
<tr>
<td>Power-up, external reset, watchdog, flash password, illegal instruction fetch</td>
<td>PORIFG</td>
<td>Reset</td>
<td>OFFFEh</td>
<td>31, highest</td>
</tr>
<tr>
<td>NMI, oscillator fault, flash memory access violation</td>
<td>NMIIFG, SMIIFG</td>
<td>(non)-maskable</td>
<td>OFFFCh</td>
<td>30</td>
</tr>
<tr>
<td>device-specific</td>
<td>OGFIFG, AGOFIFG</td>
<td>(non)-maskable</td>
<td>OFFF4h</td>
<td>29</td>
</tr>
<tr>
<td>Watchdog timer</td>
<td>WDTIFG</td>
<td>maskable</td>
<td>OFFF4h</td>
<td>26</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Exception number</th>
<th>IRQ number</th>
<th>Offset</th>
<th>Vector</th>
</tr>
</thead>
<tbody>
<tr>
<td>16+n</td>
<td>n</td>
<td>0x0040+4n</td>
<td>IRQn</td>
</tr>
<tr>
<td>18</td>
<td>2</td>
<td>0x004C</td>
<td>IRQ2</td>
</tr>
<tr>
<td>17</td>
<td>1</td>
<td>0x0048</td>
<td>IRQ1</td>
</tr>
<tr>
<td>16</td>
<td>0</td>
<td>0x0044</td>
<td>IRQ0</td>
</tr>
<tr>
<td>15</td>
<td>-1</td>
<td>0x0040</td>
<td>Systick</td>
</tr>
<tr>
<td>14</td>
<td>-2</td>
<td>0x003C</td>
<td>PendSV</td>
</tr>
<tr>
<td>13</td>
<td></td>
<td>0x0038</td>
<td></td>
</tr>
<tr>
<td>12</td>
<td></td>
<td>0x002C</td>
<td>SVCall</td>
</tr>
<tr>
<td>11</td>
<td>-5</td>
<td></td>
<td></td>
</tr>
<tr>
<td>10</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>9</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>8</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>7</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>6</td>
<td>-10</td>
<td>0x0018</td>
<td></td>
</tr>
<tr>
<td>5</td>
<td>-11</td>
<td>0x0014</td>
<td>Usage fault</td>
</tr>
<tr>
<td>4</td>
<td>-12</td>
<td>0x0010</td>
<td>Bus fault</td>
</tr>
<tr>
<td>3</td>
<td>-13</td>
<td>0x000C</td>
<td>Memory management fault</td>
</tr>
<tr>
<td>2</td>
<td>-14</td>
<td>0x0008</td>
<td>Hard fault</td>
</tr>
<tr>
<td>1</td>
<td></td>
<td>0x0004</td>
<td>NMI</td>
</tr>
<tr>
<td></td>
<td></td>
<td>0x0000</td>
<td>Reset</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td>Initial SP value</td>
</tr>
</tbody>
</table>

TI MSP430

https://www.ti.com/

Espressif ESP32-S2

https://www.espressif.com/

ARM Cortex-M4

https://www.arm.com/
Time sequence of multiple interrupts

Computer with 3 I/O devices
- Printer, priority 2
- Hard disc, priority 4
- RS232, priority 5

0 10 15 20 25 35 40

User program Printer ISR RS232 ISR Disk ISR Printer ISR User program

User Printer User Printer User

Disk interrupt priority 4 held pending
RS232 ISR finishes disk interrupt occurs
Disk ISR finishes
Printer ISR finishes

Stack
Handling of multiple interrupt sources

Cyclic polling of interrupt sources by the interrupt controller (interrupt flag for each source in a status register of the controller).

If the interrupt flag for a component is set

- Stop cyclic polling and start ISR for the source.

Several new/additional interrupt requests possible during or after the execution of an ISR.

- Two alternative ways of treating new/additional interrupts
Polling: 1. method

Continue cycling polling at the interrupt source following the last served source ➔ all interrupt sources have an equal chance of being served („fair“ processor sharing)
Polling: 2. method

Cyclic polling always starts at a pre-determined first source. Different sources automatically get different priorities. Polling favors components with higher priority.
Polling

Priorities of the old 80286 (and many other x86):

<table>
<thead>
<tr>
<th>Priority</th>
<th>Exception</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>RESET Reset/initialization</td>
</tr>
<tr>
<td>1</td>
<td>TRAP Exception during instruction execution</td>
</tr>
<tr>
<td></td>
<td>INT Software interrupt</td>
</tr>
<tr>
<td>2</td>
<td>TRACE Single step execution</td>
</tr>
<tr>
<td>3</td>
<td>NMI Non-maskable interrupt</td>
</tr>
<tr>
<td>4</td>
<td>... Co-processor error</td>
</tr>
<tr>
<td>5</td>
<td>IRQ Maskable interrupts</td>
</tr>
</tbody>
</table>

Disadvantage of polling:

Using software for cyclic prioritization and identification of interrupts is too time consuming.
Daisy chaining using hardware

Use specialized hardware for prioritization and identification of interrupts.

Example: Chaining of interrupt sources to a priority chain (Interrupt Daisy Chain).

Each source for an interrupt uses dedicated hardware for connecting with a successor and predecessor (decentralized prioritization)

The first source in the chain has automatically the highest priority.

The priority of the other sources depend on the position in the chain.
Daisy chaining using hardware
Questions & Tasks

- Name the key differences between a subprogram and an ISR!
- What is the purpose of an ISR?
- Could a computer operate without ISRs?
- Looking at the typical steps of an ISR – which step(s) should be uninterruptable?
- How to handle multiple interrupt sources?
ASSEMBLER
Where are we now?

- **Level 5**: Problem-oriented language level
  - Translation (Compiler)
  - Java, C#, C++, C, Haskell, Cobol, ...
  - Javac, VS .NET

- **Level 4**: Assembly language level
  - Translation (Assembler)
  - Java Byte Code, MSIL/CIL

- **Level 3**: Operating system machine level
  - Partial interpretation (operating system)
  - Unix, Windows, iOS
  - JVM, CLR; JIT/Interpreter

- **Level 2**: ISA (Instruction Set Architecture) level
  - Interpretation (microprogram) or direct execution
  - x86, PPC, ARM, ...
  - JVM, CLR; JIT/Interpreter

- **Level 1**: Microarchitecture level
  - microprogram/none

- **Level 0**: Digital logic level
  - Netburst, ISSE, ASX, <none>, ...
  - hardware

- Hardware
  - Core i7, ARM, PPC620, ...

Technology Levels:

- **Level 0**: Digital logic level
- **Level 1**: Microarchitecture level
- **Level 2**: ISA (Instruction Set Architecture) level
- **Level 3**: Operating system machine level
- **Level 4**: Assembly language level
- **Level 5**: Problem-oriented language level
Compiler vs. Assembler

Assembler
- Source: symbolic representation of a machine language (assembly language)
- Destination: numerical representation of the machine language (instructions from ISA)
- Examples: inline assembler in Visual Studio, MASM, ilasm, asm (gcc, Linux), MMIXal, nasm, ...

Compiler
- Source: high-level language (depends on the definition of „high“ ...), e.g., C, Java, C#, Cobol, Modula, C++, ...
- Destination: assembler language or (built-in assembler) numerical representation of the machine language (instructions from ISA)
- Examples: C#-Compiler in Visual Studio, gcc, cc, javac, ...

Assembler language
- Pure assembler language: 1:1 mapping onto ISA instructions
- But additionally: symbolic names, addresses, labels
Reasons for an assembler level

Full access to HW features
- (almost) all (visible) registers are exposed to the assembler language, all flags can be read or set, many „hidden“ features can be used
  - E.g., try accessing the Pentium performance counters from within Java

Performance
- Optimized code for special purposes
  - Real-time: exact number of CPU cycles can be counted, guaranteed access times to registers, deterministic response times of sub-routines (again: try Java and real-time – and see what a garbage collector does...)
  - Low memory footprint: no useless overhead, optimized loops, etc.

But much harder to program in assembler ....
- Thus typically combined with, e.g., C – only performance critical parts of a program will be tuned via (inline) assembler (https://en.wikipedia.org/wiki/Inline_assembler)
Examples for assembler: \( N = I + J \)

**Pentium**

**FORMULA:**

```assembly
MOV EAX, I ; register EAX = I
ADD EAX, J ; register EAX = I + J
MOV N, EAX ; N = I + J
```

```c
I DW 3 ; reserve 4 byte initialized to 3
J DW 4 ; reserve 4 byte initialized to 4
N DW 0 ; reserve 4 byte initialized to 0
```

**SPARC**

**FORMULA:**

```assembly
SETHI %HI(I),%R1 ! R1 = high-order bits of the address of I
LD [%R1+%LO(I)],%R1 ! R1 = I
SETHI %HI(J),%R2 ! R2 = high-order bits of the address of J
LD [%R2+%LO(J)],%R2 ! R2 = J
NOP ! wait for J to arrive from memory
ADD %R1,%R2,%R2 ! R2 = R1 + R2
SETHI %HI(N),%R1 ! R1 = high-order bits of the address of N
ST %R2,[%R1+%LO(N)] ! N = I + J
```

```c
I: .WORD 3 ! reserve 4 byte initialized to 3
J: .WORD 4 ! reserve 4 byte initialized to 4
N: .WORD 0 ! reserve 4 byte initialized to 0
```
Pseudo instructions

Pseudo instructions or assembler directives
- Help a lot for assembler programming
- Depend on designer of the assembler, not the ISA

Examples (MASM for Pentium)
- DB allocate storage for one or more (initialized) bytes
- DW allocate storage for one or more (initialized) 32 bit words
- PROC start a procedure
- MACRO start a macro definition
- INCLUDE fetch and include another file
- IF start conditional assembly based on a given expression
- PUBLIC export a name defined in the module
Macros vs. procedures

Example macro: swap P, Q

```
SWAP MACRO
    MOV EAX, P
    MOV EBX, Q
    MOV Q, EAX
    MOV P, EBX
ENDM
```

Differences

<table>
<thead>
<tr>
<th>When is the call made?</th>
<th>Macro</th>
<th>Procedure</th>
</tr>
</thead>
<tbody>
<tr>
<td>During assembly</td>
<td>Yes</td>
<td>No</td>
</tr>
<tr>
<td>Is the body inserted into the object program every place the call is made?</td>
<td>Yes</td>
<td>No</td>
</tr>
<tr>
<td>Is a call instruction inserted into the object program and later executed?</td>
<td>No</td>
<td>Yes</td>
</tr>
<tr>
<td>Must a return instruction be used after the call is done?</td>
<td>No</td>
<td>Yes</td>
</tr>
<tr>
<td>How many copies of the body appear in the object program?</td>
<td>One per macro call</td>
<td>1</td>
</tr>
</tbody>
</table>

Macros are "textually" inserted into the assembler program each time a call is made, formal parameters are converted into actual parameters (i.e., the above macro can be used for SWAP A,B as well as SWAP X,Y)
The assembler process

Step-by-step translation does not work
- Forward references: symbols used before being defined ...

Solution: two-pass translator or single-pass plus conversion into intermediate format
- Pass: reading the source

First step
- Check syntax
- Create a symbol table, opcode table, literal table
- Check instruction length (opcode, operands, ...)

Second step
- Generate object code (*.o, *.obj, ...)
- Generate information for linker
Generation of an executable binary program

![Diagram of program generation process]

- **Source procedure 1**
- **Source procedure 2**
- **Source procedure 3**
- **Translator**
- **Object module 1**
- **Object module 2**
- **Object module 3**
- **Linker**
- **Executable binary program**
Example Structure of an Object Module / File

Different formats exist (e.g. COFF, ELF)

Relocation often done by MMU (or code is position independent) – but also loader can relocate

<table>
<thead>
<tr>
<th>Identification</th>
<th>Machine instructions and constants</th>
<th>Relocation dictionary</th>
<th>End of module</th>
</tr>
</thead>
<tbody>
<tr>
<td>External reference table</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Entry point table</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
Example object modules

Each module has its own address space starting at 0
Objects before/after linking and relocation

Before relocation and linking

After relocation and linking
Dynamic linking

Before EARTH is called

After EARTH has been called and linked
Shared libraries

DLL (Dynamic Link Library, Windows), shared library (Unix)
- Save a lot of memory as they appear only once
- Many processes “share“ the same code (instructions)

A lot more info given in OS and compiler courses!
Questions & Tasks

- Why using assembler today?
- Why using macros? Typically they require more space…
- When is the starting address of an object determined? (Many answers possible … When do we really need the address at the latest?)
Summary

Soft-/Hardware boundary
Complex Instruction Set Computer (CISC)
Reduced Instruction Set Computer (RISC)
Examples of ISA
Instructions formats
Addressing formats
Types of instructions
Procedures, Traps, Interrupts & Co.
Assembler