mutalenucom / vol II
The Mutalenu Review · Vol II · Issue 04

Vol II · Issue 04 · A Mutalenu Long Read

From Bits to C.

How a computer actually works, from the wires up to a real C program. Notes from working through it, written for the version of you who only knew Python.

Length34 modules · 8 parts Read time~ 5 hours Last revisedMay 2026 PriceFree

Part 0

A note before you start.

By the end of this note you will be able to look at a line of C code and explain what is happening at every level of the machine: the bits, the gates, the registers, the instructions, and the memory. You will also be able to write your own C programs that compile and run.

How to use this note

This note is built bottom-up. The first part is about bits: how a computer represents anything as a number. The next part is about gates: how a few simple physical switches build up arithmetic and memory. After that you will see a whole computer as a single picture, and then you will program one: first in machine code, then in assembly, and finally in C. The C section is the last third of the note on purpose. By the time you reach it, every piece of syntax should already feel grounded in something you can see.

Each module follows the same shape. First it names the problem the technology was invented to solve. Then it draws a small mental model with a diagram. Then it shows enough code, instructions, or wiring to make the model concrete. Then it ends with a prompt that you should be able to answer in your own words before moving on.

If a prompt is hard to answer, the next module will not stick. The ideas compound. The bottom of the stack supports the top. Reread the diagram, reread the code, and try again. Skipping is expensive in this kind of material.

A note on tools

Where C code appears, it is written to be runnable on any modern Unix-like system with a C compiler. Where assembly appears, it is for a small teaching ISA called the LC-3, a tiny computer that exists to make these ideas tangible. You do not need any specific tools installed to read along.

Test yourself

Why does this note teach gates and assembly before C, instead of starting with C?

Part I. Bits, the atom of computing

Everything is a number.

A computer does not understand text, images, sound, or instructions. It understands one thing only: voltage levels in wires, which we call bits. Everything else is a convention that turns a stream of those bits into something we recognise.

1. Why everything is a number

Inside the machine, there is no such thing as a letter, a colour, or a music note. There are only wires, and at any given moment each wire is either at a high voltage or a low voltage. We call the high state 1 and the low state 0. A single one of these is a bit. That is the entire alphabet.

If you only have two symbols, you have to combine them to encode anything bigger than a yes/no question. Group eight wires together and you have 256 possible combinations of highs and lows. That is a byte, and it is enough to represent a small whole number, a single character of English text, or one channel of a single pixel of a colour image. The interpretation depends entirely on what the program decides those eight bits mean.

This is the most important idea in the entire note: the bits do not carry their meaning with them. The same byte 01000001 can be the number 65, the letter A, or part of a colour value. The program is what decides.

Once you have a way to encode whole numbers in bits, encoding the rest follows. Letters become numbers (a table assigns each letter a number). Pixels become numbers (red, green, and blue intensities, each a small whole number). Sound becomes numbers (the height of the wave, sampled many times a second). The whole digital world is a long argument over which numbers mean what.

Test yourself

If a byte holds the value 01000001, name three different things that byte could mean depending on what the program is doing.

2. Binary, hex, and conversions

Counting in binary is the same idea as counting in decimal. You just have fewer symbols, so you carry over more often. In decimal, after 9 you write 10. In binary, after 1 you write 10 (which means "two"). Each position has a place value: in decimal, the places are powers of ten; in binary, they are powers of two.

0 1 0 0 0 0 0 1 128 64 32 16 8 4 2 1 2⁷ 2⁶ 2⁵ 2⁴ 2⁰ 64 + 1 = 65
An 8-bit number. Each position contributes its power of two if the bit is 1.

To convert a binary number to decimal, add up the place values where the bit is 1. The byte 01000001 has its 64-place set and its 1-place set. 64 + 1 = 65. To go the other way, repeatedly subtract the largest power of two that fits.

Why hexadecimal exists

Binary is correct but inhuman to read. Eight bits is fine; thirty-two is a nightmare. Hexadecimal (base 16) is the convention that solves this. Each hex digit represents exactly four bits, so a 32-bit number becomes 8 hex digits. Long, but readable. Hex digits are 0 through 9 and A through F (where A = 10, B = 11, ..., F = 15).

binary    0100 0001
hex       4    1        → 0x41
decimal              65

The prefix 0x in C means "what follows is in hex". You will see this constantly. Hex is not a third number system that the computer uses internally. The computer is still using bits. Hex is just a shorthand for humans who got tired of writing 32 bits in a row.

Test yourself

What is the decimal value of the binary number 1100 1010? What is its hex representation?

3. Two's complement

Binary as we have described it can only represent zero and positive numbers. Real computers need to represent negative numbers too, and they need a representation that lets the same hardware add positive and negative numbers without having to check signs first. The convention that wins, on essentially every modern processor, is called two's complement.

The rule is simple. In an n-bit two's complement number, the leftmost bit (the highest-place bit) has a place value that is negative. So in 8-bit two's complement, the leftmost bit means -128 instead of +128. The other bits keep their normal positive place values.

That single change is enough to encode every integer from −128 to +127 in eight bits, and addition still works using the same hardware as before. This is the magic part. You add the bits, you let the carry propagate, and you throw away any carry that falls off the top.

// 5 + (-3) in 8-bit two's complement
   0000 0101    // +5
 + 1111 1101    // -3
 -----------
 1 0000 0010    // the leading 1 falls off, leaves 0000 0010 = +2 ✓

To negate a number in two's complement, you flip every bit and add 1. So +5 (0000 0101) becomes -5 by flipping to 1111 1010 and adding 1 to get 1111 1011. Sanity check: that bit pattern has the leftmost bit set (-128) plus the rest = -128 + 64 + 32 + 16 + 8 + 2 + 1 = -5. ✓

The asymmetry you should notice

An 8-bit two's complement number ranges from −128 to +127. There is one more negative number than positive number, because zero takes up one of the positive slots. This asymmetry is real and shows up in real bugs: the negation of −128 cannot be represented in 8 bits.

Test yourself

What is the 8-bit two's complement representation of -1? Of -128? Of 0?

4. Floating point

Two's complement handles whole numbers, but most real-world quantities are not whole. Distances, probabilities, brightnesses, and weights are all continuous. The computer needs a way to encode them in a fixed number of bits, knowing it cannot store every possible real number, only an approximation.

The idea is the same as scientific notation. Any number can be written as ±M × 2^E, where M is a fraction (the mantissa) and E is an integer (the exponent). If you devote some bits to the sign, some to M, and some to E, you can represent a huge range of numbers. The trade-off is that you cannot represent every number, only ones that fit the form.

S sign 1 bit EEEEEEEE exponent 8 bits (biased) MMMMMMMMMMMMMMMMMMMMMMM mantissa 23 bits
IEEE 754 single precision (a 32-bit float): 1 sign + 8 exponent + 23 mantissa.

The format above is the standard. There is a 64-bit version called "double precision" that gives more mantissa and exponent bits and so more accuracy and range. C calls these float and double. Python's float is actually a double. Same 64-bit format.

The consequence of this format you should hold onto: floating-point numbers are approximations. 0.1 + 0.2 in any language that uses IEEE 754 (which is most of them) gives 0.30000000000000004. This is not because the computer is broken. It is because 0.1 cannot be exactly represented as a sum of negative powers of two, the same way 1/3 cannot be exactly represented in decimal.

Test yourself

If you store the result of a long chain of floating-point operations, why is comparing it to a target value with == a bad idea?

Part II. Gates and circuits

From a switch to arithmetic.

A computer is built from one component repeated billions of times: a tiny electronic switch called a transistor. From that switch you build logic gates, from gates you build arithmetic and memory, and from arithmetic and memory you build everything else.

5. The transistor as a switch

A transistor is a small piece of silicon arranged so that one wire (the gate) controls whether current can flow between two other wires (the source and the drain). When the gate is at a high voltage, current flows. When it is at a low voltage, current does not. That is it. A transistor is a switch that another wire is allowed to flip.

The reason transistors changed the world is that they are tiny, fast, and cheap. A modern processor contains roughly tens of billions of transistors. They are arranged in patterns called logic gates. Small circuits that take a few input wires and produce an output wire whose voltage depends on the inputs in a specific way.

You will rarely think at the level of individual transistors when programming. You will think in terms of gates, or further up, in terms of registers and arithmetic. But knowing that the building block is a switch grounds the rest. Every operation a computer performs ultimately reduces to "this wire goes high if these other wires are in the right configuration."

Test yourself

If a transistor is essentially a switch, why does it matter that it is much faster and smaller than a mechanical switch?

6. Logic gates and Boolean algebra

Wire a few transistors together in the right shape and you get a circuit that takes one or more 1/0 inputs and produces a 1/0 output, where the output is determined by a fixed rule. There are six classic gates worth memorising. Each one has a name, a symbol, and a truth table.

GateOutput is 1 when…Notation
NOTinput is 0~A or !A
ANDboth inputs are 1A & B
ORat least one input is 1A | B
NANDnot both are 1~(A & B)
NORboth are 0~(A | B)
XORexactly one is 1A ^ B

NAND on its own is enough to build every other gate (so is NOR). That is a deep fact: if you have a chip that does only NAND, you can build a complete computer. NAND is sometimes called a universal gate for that reason. Real chips don't bother. They include all the common gates as primitives, but the theoretical minimum is one operation.

Gates obey a few algebraic laws (Boolean algebra) that let you simplify expressions. A & 1 = A. A | 0 = A. A & A = A. The most useful pair are De Morgan's laws: ~(A & B) = ~A | ~B and ~(A | B) = ~A & ~B. They say that "not both" is the same as "at least one is not", and "not either" is the same as "both are not". You will use these constantly when you start writing C conditions.

Test yourself

Using De Morgan's law, rewrite !(x > 0 && y > 0) as a condition that uses || instead of &&.

7. Combinational circuits

Wire gates together with no loops and you get a combinational circuit. The output is some function of the current inputs and nothing else. There is no memory, no state. Surprisingly many useful things can be built this way, including arithmetic.

The full adder

The smallest interesting combinational circuit is a 1-bit full adder. It takes three inputs. A, B, and a carry-in C from a less significant position, and produces two outputs: a sum bit and a carry-out. With one rule per case (a truth table), you can build it from a handful of XOR, AND, and OR gates.

A B Cin Full adder XORs + ANDs + OR Sum Cout
One-bit full adder. Chain 32 of these together and you have 32-bit addition.

To add two 32-bit numbers, you wire 32 of these in a row, feeding each one's carry-out into the next one's carry-in. The first carry-in is 0. The last carry-out tells you whether the addition overflowed. That is how every CPU does integer addition. There are speed-ups, but the core idea is exactly this.

Other combinational pieces you should know by name

A multiplexer (mux) takes several input wires and a "selector" input, and outputs whichever input wire the selector points at. It is the hardware equivalent of a giant if. A decoder takes an n-bit input and lights up exactly one of 2^n output wires. The one corresponding to the input value. It is how memory addresses get turned into "which row of memory to read." You do not need to memorise the gate-level wiring of these. You need to know what they do.

Test yourself

If a CPU has 32-bit-wide arithmetic and you add two large 32-bit numbers, how does the CPU know that an overflow happened?

8. Memory: latches, flip-flops, registers

Combinational circuits compute, but they do not remember. The output depends only on the current inputs. To build memory you have to introduce a feedback loop. Wire some output back into an input, so that the circuit can be in a state that "holds" itself.

The smallest such circuit is the SR latch: two NOR gates wired so each one's output is the other's input. It has two inputs, S (set) and R (reset), and one output Q. When you pulse S, Q becomes 1 and stays there even after S goes back to 0. When you pulse R, Q becomes 0 and stays there. Memory.

A D flip-flop is a small refinement: it has one data input D and a clock input. On each tick of the clock, whatever was on D gets latched into the output. Until the next clock tick, the output stays put. You can think of it as a 1-bit storage cell that updates exactly once per clock pulse.

Glue 32 of these flip-flops side by side, wire their clocks together, and you have a 32-bit register. A tiny named slot of memory that the CPU can read from and write to in a single clock cycle. A real CPU contains dozens of these. They are the closest, fastest memory you can use.

The clock

Sequential circuits, the ones with memory, need a way to coordinate when memory updates. That is the clock: a wire that toggles between 0 and 1 at a fixed rate. On each rising edge, every flip-flop in the system samples its input and updates. Everything in a synchronous CPU happens in lockstep with this signal.

When you read that a CPU runs at "3 GHz", that is the clock: three billion ticks per second. Each tick is the longest amount of time any single combinational path between flip-flops is allowed to take. If you want a faster CPU, either tick faster (which gets harder past a few GHz) or do more useful work per tick (which is most of what modern CPUs are about).

Test yourself

Why does adding feedback (an output wired back to an input) turn a logic circuit into something that can store information?

Part III. A whole computer

The whole machine, in one picture.

A computer is a fixed arrangement of five things: a place to put data, a place to do operations on that data, a place to put the program, a way to talk to the world, and a controller that reads the program and decides what to do next.

9. The von Neumann model

The architecture nearly every modern computer uses is the von Neumann model. Its key claim is that the program and the data both live in the same memory, and the CPU reads instructions from that memory one at a time and executes them. The model has five parts.

Memory program + data Input keyboard, disk CPU control + ALU Output screen, disk The CPU fetches an instruction from memory, decodes it, executes it, and repeats. I/O moves data between memory and the outside world.
The five-part picture. Every laptop, phone, and microcontroller follows it.
  1. Memory. A long array of bytes, each one with a numerical address. Holds both the program (as bytes that mean instructions) and the data (as bytes that mean numbers, characters, pixels).
  2. Processing unit (ALU). The arithmetic-logic unit. Takes two inputs and an operation (add, subtract, AND, shift, compare) and produces an output. This is what you built up in Part II.
  3. Control unit. Reads the next instruction from memory, figures out what it is, and orchestrates the ALU and registers and memory accordingly. The control unit is the conductor.
  4. Input. Anything that turns the outside world into bytes the program can read. Keyboards, mice, disks, network cards.
  5. Output. Anything that turns bytes the program produced into something in the outside world. Screens, speakers, disks again, network cards again.

The CPU is the combination of the processing unit and the control unit. They are usually on the same chip, sharing a small set of registers. From the chip's point of view, the rest of the world is just memory addresses to read from and write to.

Test yourself

The von Neumann model says that programs and data live in the same memory. Why is that fact what makes a computer "general-purpose"?

10. The datapath and the instruction cycle

Inside the CPU, the registers and the ALU are wired together by a network called the datapath. The control unit's job is to set the right multiplexer selectors and the right "write enable" signals on each clock cycle so that data flows along the right path. Everything the CPU does is some pattern of "open this gate, close that gate, latch this register, route this number through the ALU."

Each instruction the CPU executes goes through the same five-step instruction cycle. Memorising this cycle is one of the most useful things you can take from this note.

  1. Fetch. The control unit reads the instruction at the address held in a special register called the program counter (PC). The instruction (usually 16 or 32 bits) gets latched into another special register called the instruction register (IR).
  2. Decode. The control unit looks at the bits of the IR and figures out what the instruction is. Add, load, branch, store. The bit pattern that tells you which is called the opcode.
  3. Evaluate the address. If the instruction needs an address (load this from memory, branch to that), the control unit computes it. This may involve adding offsets, reading a register, or both.
  4. Fetch operands. The actual values the instruction needs are read. From registers, or from memory at the address that was just computed.
  5. Execute and store. The ALU does its operation, and the result is written back to a register or to memory. The PC is updated to point at the next instruction (usually just PC + 1 instruction size, unless a branch took us somewhere else).

A simple CPU does this whole cycle, top to bottom, on every instruction. A modern CPU is much cleverer. It overlaps stages of multiple instructions (this is called pipelining), executes multiple instructions per cycle, and reorders them. But the cycle above is what is logically happening on every line of code that runs.

Test yourself

If the program counter is at address 1000 and the instruction at 1000 is 4 bytes long, what is the program counter after a normal (non-branching) instruction completes?

11. Memory and addresses

Main memory is a giant array. Each slot has a unique address, which is itself a number. To read or write a slot, the CPU puts the address on the address bus (a bunch of wires going to the memory chip), and either reads the value back on the data bus or writes a value out on it.

Address space size is determined by how many wires the address bus has. A 32-bit address bus can represent 2^32 distinct addresses, which is about 4 billion, that is the famous "4 GB limit" of 32-bit operating systems. Modern processors use 64-bit addresses, which is a far larger number than any current machine has memory for.

How big is each slot? Almost always one byte. Eight bits. So memory is byte-addressable. To read a 32-bit (4-byte) integer, the CPU asks for four consecutive bytes starting at some address, and the memory subsystem returns them as a single 32-bit value.

Endianness

If you store the 32-bit number 0x12345678 at address 1000, in which byte does 0x12 end up? There are two conventions. Big-endian stores the most significant byte at the lowest address (1000 → 0x12, 1001 → 0x34, ...). Little-endian stores the least significant byte at the lowest address (1000 → 0x78, 1001 → 0x56, ...). x86 and ARM are both little-endian, which is why most code you read assumes that, but the question of byte order matters whenever you serialise numbers across a network or to disk.

Test yourself

If a system has a 64-bit address bus, what is the maximum amount of memory it can address? (You can leave the answer as 2^N.)

Part IV. An ISA you can run

Programming a computer for real.

An instruction set architecture is the contract between hardware and software: the list of instructions a CPU promises to understand. We will use a small teaching ISA called the LC-3 to make every concept concrete. It is small enough to learn end to end, and the lessons transfer directly to real machines.

12. What an instruction set is

The CPU understands a fixed list of instructions, each of which is just a particular bit pattern. The list and its rules together are called the instruction set architecture, or ISA. Different chip families have different ISAs. X86 (Intel, AMD), ARM (Apple, most phones), RISC-V (open standard), and many others.

An ISA specifies, at minimum: what registers the CPU has, what instructions it supports (and the bit pattern for each), what addressing modes are allowed (how memory addresses are computed), and how interrupts and traps are handled. It is the "API" of the chip.

Two programs written in the same high-level language (say, C) can be compiled to two completely different ISAs. The compiler knows the rules of each ISA and translates accordingly. The high-level language is portable; the resulting machine code is not. That is why a Mac binary does not run on a Windows machine of the same era. The underlying ISAs may be different (or the OS conventions may be different even if the ISA matches).

RISC vs CISC

Two design philosophies have competed for decades. RISC (reduced instruction set computer) keeps the instruction list short and simple. Every instruction is a single fixed size, and complex things are built by combining simple ones. ARM and RISC-V are RISC. CISC (complex instruction set computer) has many more instructions, including some that do very specific complex things in one go. x86 is CISC. Modern x86 chips internally translate CISC instructions into something RISC-like before executing, so the distinction is fuzzier than it used to be.

Test yourself

Why does the same C program compile to different machine code on a Mac (ARM) and a Linux x86 server, even though the source code is identical?

13. The LC-3, in one diagram

The LC-3 is a teaching ISA designed to be small enough to fully understand. It has:

  • Eight general-purpose registers (R0 through R7), each 16 bits wide.
  • 16-bit memory addresses, so 2^16 = 65,536 memory locations. Each location holds 16 bits, not 8. The LC-3 is "word-addressable" with 16-bit words.
  • A program counter (PC), also 16 bits.
  • Three condition flags. N (negative), Z (zero), P (positive), that the CPU sets after most operations, so the next instruction can branch on them.
  • Fifteen instructions. That is the entire instruction set.
Registers R0 16 bits R1 16 bits R7 16 bits Special PC (program counter) IR (instruction reg) Flags N Z P Eight 16-bit registers. The PC tells the CPU which instruction to fetch next. N, Z, P get set by most operations and let the next instruction branch on the result.
The visible state of an LC-3 CPU. Real CPUs have more, but the structure is the same.

That is it. The whole machine: fifteen instructions and a handful of registers, enough to write any program in principle. Real CPUs have more registers (and very different ones), more instructions, and many more states (caches, pipeline stages, vector units), but the essence is unchanged.

Test yourself

If LC-3 memory has 16-bit addresses and each memory location holds 16 bits, how many bytes of memory does an LC-3 program have access to in total?

14. The LC-3 instruction set

Every LC-3 instruction is exactly 16 bits long. The top four bits are the opcode. They tell the CPU which instruction this is. The remaining 12 bits are the operands (which registers, what address, what immediate value). This fixed-size, fixed-layout design is what makes RISC ISAs straightforward to decode.

The instructions fall into a few groups.

Arithmetic and logical

ADD R1, R2, R3       ; R1 = R2 + R3
ADD R1, R2, #5       ; R1 = R2 + 5  (small immediate)
AND R1, R2, R3       ; R1 = R2 AND R3 (bitwise)
NOT R1, R2           ; R1 = NOT R2  (bitwise complement)

That is it for arithmetic on the LC-3. There is no SUB, MUL, DIV. You build subtraction with NOT and ADD, and you build multiplication with a loop. Real ISAs have many more, but the principle is the same: a fixed set of one-cycle operations, composed by software to do everything else.

Loads and stores

LD  R1, LABEL        ; load the value at address LABEL into R1
ST  R1, LABEL        ; store R1 into memory at address LABEL
LDR R1, R2, #4       ; load from address (R2 + 4) into R1   (indexed)
LDI R1, LABEL        ; load indirect. Read the address from LABEL, then load from that address

LD and ST are the only instructions that talk to memory. Every other instruction works on registers. This is a defining feature of RISC: load-store architecture. To operate on memory, you load it into a register, operate, then store back. Real CPUs work the same way.

Control flow

BRnzp LABEL          ; branch to LABEL if N, Z, or P flag is set
JMP  R1               ; jump to address held in R1
JSR  LABEL            ; jump to subroutine; saves return address in R7
TRAP x21              ; system call (we will see this in Part V)

Branch instructions test the flags (N, Z, P) and either continue or jump. To do "branch if equal," you compare two values (subtract them) and branch if the Z flag is set afterward. This is how every conditional in every high-level language eventually compiles down: a comparison that sets flags, followed by a flag-checking branch.

Test yourself

The LC-3 has no SUB instruction. How would you compute R1 = R2 - R3 using only ADD and NOT? (Hint: think about two's complement.)

15. Assembly and the assembler

The mnemonics like ADD, LD, and BR are a programmer's convenience. The CPU does not see the letters "A-D-D". It sees the 16-bit pattern that ADD expands to. The translation from mnemonics to bits is done by a small program called the assembler.

An assembly program is a text file. Each line is either an instruction, a label (a name for an address), or an assembler directive (a hint to the assembler that does not produce a machine instruction). Here is a complete LC-3 program that adds the numbers from 1 to 10.

        .ORIG x3000        ; place this program starting at address 0x3000

        AND  R0, R0, #0    ; R0 = 0  (the running sum)
        AND  R1, R1, #0
        ADD  R1, R1, #10   ; R1 = 10 (the counter)

LOOP    ADD  R0, R0, R1    ; sum += counter
        ADD  R1, R1, #-1   ; counter -= 1
        BRp  LOOP          ; if counter is positive, loop again

        TRAP x25           ; HALT
        .END

The assembler reads this twice. On the first pass it builds a symbol table, mapping every label to the address it appears at. On the second pass, it generates the machine code, replacing each label with the right address (or offset) and each mnemonic with its opcode bits. The output is a binary file the LC-3 can load into memory and execute.

Real assemblers for x86 or ARM do the same job, just with bigger instruction sets. The mental model, assembly is one-to-one with machine code, plus a few conveniences. Is identical.

Test yourself

Why does the assembler need to make two passes through the source code? What goes wrong if it tries to do everything in one pass?

16. Subroutines and the stack

Programs are written in pieces called subroutines (in C, we call them functions). To use one, you have to do four things: pass the arguments in, jump to the subroutine, run it, and come back to where you were with the return value in hand.

The LC-3's JSR instruction handles two of those: it jumps to the subroutine and saves the return address in R7. To return, the subroutine just jumps to R7 (using RET, which is the LC-3 mnemonic for JMP R7). Arguments and return values are passed in registers by convention.

This works fine until subroutines call other subroutines. If main calls foo which calls bar, both JSR calls overwrite R7. Now foo has lost its return address. The fix is to push R7 onto a stack before calling another subroutine, and pop it back before returning.

main's locals foo: saved R7 foo's locals bar: saved R7 bar's locals low addr ↑ stack pointer → ↓ stack grows down
Each function gets its own region of the stack. Its frame. To hold the return address and locals.

The stack is a region of memory plus a register called the stack pointer (SP) that points to its top. To push, you decrement SP and write to the new top. To pop, you read from the top and increment SP. Pushes and pops are LIFO (last in, first out), which is exactly what nested function calls need.

Each function call gets its own piece of the stack, called a stack frame. The frame holds the saved return address, any saved registers, and any local variables. When the function returns, its frame is popped and the previous function resumes from where it was. This is called the call stack, and you have used it every time you have written a recursive function.

Test yourself

If a function forgets to restore the saved return address before returning, what happens when it returns?

Part V. I/O, traps, and interrupts

How the CPU talks to the world.

A CPU on its own is a closed system: it reads instructions from memory and writes results back. To do anything useful, like show pixels, take keystrokes, or talk to a disk, it has to communicate with the outside world. That communication is the job of I/O, system calls (TRAPs), and interrupts.

17. I/O: how the CPU talks to the world

Devices like keyboards, displays, and disks are not part of the CPU. They have their own electronics, their own clocks, and their own internal registers. But the CPU still needs to read and write them. The trick that nearly every modern computer uses is called memory-mapped I/O: a few special memory addresses are wired so that reading or writing them actually talks to a device, not to RAM.

For example, on the LC-3, address 0xFE00 is the keyboard status register. The CPU reads it; if the high bit is 1, a key is ready. Address 0xFE02 is the keyboard data register. Read it to get the character. To print a character, write it to address 0xFE06 (the display data register), after checking that 0xFE04 (the display status register) is ready. From the CPU's point of view, this is identical to reading and writing memory; the address bus and data bus are just routed differently for these specific addresses.

Polling vs interrupts

If your program wants to know "did the user press a key yet?" the simplest approach is to read the keyboard status register in a loop until the high bit goes high. This is called polling. It works, but it wastes CPU. Most of those reads return "no, not yet."

The better approach is interrupts: the device tells the CPU when it has data, instead of the CPU having to keep asking. We will look at that mechanism in Module 19.

Test yourself

Why is polling a wasteful way to wait for keyboard input on a system that has many other things to do?

18. TRAPs and the OS service

User programs should not poke at I/O addresses directly. The keyboard register is one resource shared by every program; the disk is another. If two programs read from the keyboard at the same time, chaos. The solution is to put the operating system in charge of the hardware, and make user programs ask the OS to do I/O on their behalf.

On the LC-3, this is done through the TRAP instruction. A TRAP is essentially a "system call." The instruction includes an 8-bit number called the trap vector. When the CPU sees TRAP x21, it does this:

  1. Saves the current PC into R7 (so it can come back later).
  2. Looks up the trap vector in a table at the bottom of memory. Each entry in the table is the address of an OS routine.
  3. Jumps to that address.

The OS routine runs in supervisor mode, which has permissions the user program does not. Including the right to talk to I/O directly. When the OS routine finishes, it executes RET, which jumps back to R7, returning control to the user program where it left off.

Why this matters in real systems

Every printf in a C program eventually issues a system call to the OS to ask it to actually write characters to the terminal. Every open, read, write, and close on a file is a system call. Every malloc that needs more memory eventually asks the OS. The TRAP mechanism on the LC-3 is a stripped-down version of the system-call mechanism on every real OS.

Test yourself

Why should user programs not be allowed to write directly to I/O hardware addresses, even though the CPU technically can?

19. Interrupts and exceptions

An interrupt is the inverse of a TRAP. A TRAP is the program asking the OS for help. An interrupt is the hardware (or some asynchronous event) interrupting the CPU to demand attention.

When a key is pressed, the keyboard controller raises an interrupt line on its way to the CPU. At the end of whatever instruction the CPU is currently running, the CPU checks the interrupt line. If it is asserted (and interrupts are enabled), the CPU does roughly the same thing as a TRAP: it saves the PC and the flags, looks up the interrupt vector in a table, and jumps to the corresponding handler. The handler reads the keystroke, stashes it somewhere, and returns. The interrupted program resumes as if nothing had happened.

Interrupts are how a single CPU can give the appearance of doing many things at once. The user types, network packets arrive, the disk finishes a read, the timer ticks. Each one fires an interrupt, and the OS handlers do a tiny bit of work for each, then return to the program. Modern multitasking is built almost entirely on the timer interrupt: the OS sets a timer to fire every few milliseconds, and the timer interrupt's handler decides whether to switch to a different program. That switching is what makes your computer feel like it is running fifty things in parallel.

Exceptions

An exception is like an interrupt, but it is caused by something the running instruction did wrong: dividing by zero, accessing a memory address that is not mapped, executing an undefined opcode. The CPU detects the bad situation, treats it like an interrupt, and jumps to a handler. The handler decides what to do. Usually it kills the offending program and prints "segmentation fault" or some similar message.

Test yourself

You have a single-CPU computer running ten programs at once. How does the timer interrupt make this possible?

Part VI. From assembly to C

A higher level, with the wires still visible.

Writing every program in assembly is exhausting and error-prone. C exists as the smallest possible step up from assembly that still feels like a real programming language. By design, every C construct corresponds to something concrete in assembly, which means everything you learned in the last five parts now becomes a tool for understanding what your code is actually doing.

20. Why C exists

C was designed in the early 1970s to write the Unix operating system. The goal was to be portable. To write Unix once and have it run on different machines. Without giving up the low-level control assembly gives you. Every C feature is a thin layer over machine code. Every + is one or two assembly instructions. Every variable is a register or a memory slot. Every function call is a JSR-and-stack-frame.

What C gives you over assembly: structured control flow (if, while, for) instead of explicit branches and labels; types (int, char, double) so the compiler can pick the right instructions; functions with named parameters instead of register conventions; and portability. The same C program compiles to x86, ARM, RISC-V, and a dozen other ISAs, each with its own assembler.

What C does not give you, on purpose: garbage collection, bounds checking on arrays, hidden allocations. If you ask C for ten bytes of memory, you get exactly ten bytes. If you read past the end of an array, C does not stop you. The CPU happily reads whatever is at that address. C trusts the programmer, which is what makes it fast and what makes it dangerous.

This is why operating systems, compilers, databases, and embedded firmware are still mostly written in C (or its descendants). When you need to know exactly what every instruction is doing, you write in something that maps cleanly to instructions.

Test yourself

Why is it accurate to call C "portable assembly," and what are the implications of that for the kinds of bugs you can have in a C program?

21. A first C program

Here is a complete, runnable C program. Save it as hello.c, compile it with gcc hello.c -o hello, run it with ./hello.

// hello.c. The smallest interesting C program
#include <stdio.h>

int main(void) {
    printf("Hello, world.\n");
    return 0;
}

Three things to notice. First, the program is a function called main. When you run a C program, the operating system arranges for main to be called. The return value of main is the program's exit status. 0 means success, anything else means failure.

Second, the #include <stdio.h> at the top is a request to the preprocessor to paste in the contents of stdio.h, which contains the declaration of printf. C does not know about printf by default. The header tells the compiler what its arguments and return type are.

Third, printf is not a built-in operator. It is a function in a library that comes with every C compiler. When you call it, you are calling code that someone else wrote and that gets linked into your program automatically. Eventually it issues a system call (a TRAP, in our terms) to the OS to actually write to the terminal.

Compile, link, run

The command gcc hello.c -o hello is doing three jobs. First, the preprocessor handles #include, #define, and other directives. Second, the compiler translates the C into assembly, then into a binary file called an object file (.o). Third, the linker combines your object file with the standard library (where printf lives) into a single executable. The final binary is what you run.

Test yourself

If you delete the line #include <stdio.h>, the program may still compile but you will get a warning. Why does it still compile, and what is the warning telling you?

22. Variables, types, operators

A variable in C is a name for a piece of memory. The variable's type tells the compiler how big that piece is and how to interpret the bits in it.

int    a = 42;            // 32-bit signed integer (usually)
char   c = 'A';           // 8-bit, holds a single character (or small int)
float  f = 3.14f;         // 32-bit IEEE 754
double d = 3.141592653589; // 64-bit IEEE 754

unsigned int u = 100;    // 32-bit, no negative values
long long    big = 9999999999LL; // 64-bit signed

The reason types matter at the machine level: a char takes one byte of memory and uses one of the byte-addressed instructions. An int takes four bytes and uses 32-bit instructions. Adding two doubles uses entirely different floating-point instructions than adding two ints. The type tells the compiler which instructions to pick.

Operators

C has the operators you would expect. The arithmetic ones are +, -, *, /, %. The comparison ones are ==, !=, <, <=, >, >=. The logical ones are && (and), || (or), ! (not). The bitwise ones. These are unique to C-ish languages. Are & (and), | (or), ^ (xor), ~ (not), << (shift left), >> (shift right).

Bitwise operators are exactly the gates from Module 6, applied to every bit of an integer in parallel. Shifts move all bits left or right by a given amount; shifting left by N is the same as multiplying by 2^N, when nothing falls off the top.

Integer overflow

An int on most systems is 32 bits, so the largest value it can hold is about 2.1 billion. Add 1 to that and you get the smallest negative int, that is, integer overflow wraps around in two's complement, exactly as you saw in Part I. Most C compilers do not warn you. This is the source of an entire class of real-world bugs.

Test yourself

Why does a << 3 in C produce the same result as a * 8 for small values of a, and where does the equivalence break down?

23. Control flow: if, while, for, switch

Every higher-level control structure is, underneath, a comparison plus a branch.

if (x > 10) {
    printf("big\n");
} else {
    printf("small\n");
}

The compiler emits something equivalent to: subtract 10 from x, branch if the result is positive to the "big" code, otherwise fall through to the "small" code. After either path runs, jump to the code after the if. That is exactly what you wrote in LC-3 assembly with BRp.

int i = 0;
while (i < 10) {
    printf("%d\n", i);
    i++;
}

A while is: check the condition; if false, jump past the loop; otherwise run the body and jump back to the check. for (init; cond; step) is exactly equivalent to a while with the initialisation before, and the step at the bottom of the body. do { ... } while runs the body once before checking.

switch compiles to either a chain of compare-and-branch (for sparse cases) or a jump table. An array of code addresses indexed by the switch value. Jump tables are extremely fast for dense cases; the compiler picks whichever form is better.

Short-circuit evaluation

&& and || in C are short-circuiting: in a && b, if a is false, b is never evaluated. This is not a syntactic nicety. It is built into the way the compiler emits code, and you can rely on it. if (p != NULL && *p == 5) is safe: if p is null, the dereference never happens. We will see why that matters once we get to pointers.

Test yourself

Why is for (int i = 0; i < n; i++) identical in behaviour to a while loop, and what does the compiler actually emit for both?

24. Functions and the stack frame

A C function compiles down to a labelled chunk of assembly. Calling it issues a JSR (or its x86 equivalent, call). The function's parameters are passed by convention. The first few in registers, the rest on the stack.

int sum(int a, int b) {
    int result = a + b;
    return result;
}

int main(void) {
    int total = sum(3, 4);
    return 0;
}

When main calls sum, the compiler emits code to put 3 and 4 in the right places (registers or the stack), then calls. sum creates its own stack frame: it saves any registers it plans to clobber, allocates room for result, does the arithmetic, puts the return value in a designated register, and pops its frame. main then reads the return value from that register and stores it in total.

The exact rules for which arguments go where (this is called the calling convention) depend on the platform. On x86-64 Linux, the first six integer arguments go in specific registers and additional ones go on the stack. On ARM64 the first eight integer arguments go in registers. You rarely need to know the exact rules unless you are debugging or writing assembly. You only need to know that there is a convention and that both sides of the call follow it.

Local variables live on the stack

The variable result in sum does not exist before sum is called and does not exist after it returns. It is a slot in sum's stack frame, and that frame is gone the moment sum returns. This is why you cannot return a pointer to a local variable from a function. The memory it points at will be reused for the next function's frame the next time anything is called.

Test yourself

What goes wrong if a function returns the address of one of its own local variables?

Part VII. Pointers and memory

The part where everything clicks.

Pointers are usually framed as the hard part of C. They are not hard once you know what memory looks like, which you already do. A pointer is a number, that number is a memory address, and that is essentially the whole concept.

25. Pointers, what they really are

A pointer is a variable whose value is a memory address. That is it. On a 64-bit system, a pointer is 8 bytes, exactly the size of a memory address. The type of the pointer tells the compiler how to interpret what it points at, but the value is just a number.

int  x   = 42;       // x lives somewhere in memory, holding 42
int *p   = &x;       // p holds the address of x
int  y   = *p;       // dereference: read the int at the address in p
*p           = 100;      // dereference and write: now x is 100

Two operators do all the work. & ("address-of") gives you the memory address of a variable. * ("dereference") gives you the value at the address held by a pointer. The type after the * in a declaration is what the pointer points at: int *p is "p is a pointer to an int." The compiler now knows that *p reads (or writes) 4 bytes, and that p + 1 means "the address of the next int" (i.e., 4 bytes further), not "p plus 1 byte."

NULL and uninitialised pointers

A pointer that has not been set yet contains garbage. Whatever bits happened to be in that memory slot. Dereferencing a garbage pointer reads or writes a random address. Always initialise pointers, and use the constant NULL to mean "this pointer points at nothing." Then check for NULL before dereferencing.

Writing through a null or invalid pointer is the most common cause of crashes in C programs. The OS notices that the program is touching unmapped memory, raises an exception, and kills the program. You see "segmentation fault."

Test yourself

If p is an int * pointing at address 0x1000, what is the value of p + 1? Why is it not 0x1001?

26. Arrays and pointer arithmetic

An array in C is a contiguous block of elements of the same type. int nums[10] reserves 40 bytes (10 ints × 4 bytes) at one address. The name nums is, in most contexts, the address of the first element, that is, a pointer to its first element.

int nums[5] = { 10, 20, 30, 40, 50 };

int a = nums[2];        // 30
int b = *(nums + 2);    // also 30. Exactly the same instructions

Array indexing is a thin alias for pointer arithmetic. nums[i] means *(nums + i). The compiler emits the same instructions either way: take the base address, multiply i by the element size (4 for int), add to the base, dereference. This is why array indexing in C is a single arithmetic operation, not a function call.

Bounds checking, or the lack of it

C does not check that i is in range. If you write nums[1000], the compiler computes the address nums + 1000*4 and reads or writes there. It might be your program's own memory; it might be someone else's. Either way, you have a bug. This is called a buffer overflow, and it has been the cause of an enormous fraction of security vulnerabilities in real software for decades.

This is the price you pay for the speed of C. Languages that check bounds (Python, Java, Rust) trade a small amount of runtime cost for safety. C trades safety for raw speed and direct hardware access.

Test yourself

If arr is a char arr[10], what does arr + 5 compute? What about (int *)arr + 5?

27. Strings, arrays of char

C has no built-in string type. A string in C is a char array whose last byte is the value 0 (the null terminator). Functions that take strings keep reading characters until they hit a 0 byte.

char s[] = "hello";    // 6 bytes: 'h', 'e', 'l', 'l', 'o', '\0'
printf("%s\n", s);     // %s tells printf to read until null

The string literal "hello" in source code is shorthand for the array. The 0 at the end is added by the compiler. This is why C strings are sometimes called null-terminated or C-strings.

The standard library has functions for working with them: strlen (count characters before the null), strcpy (copy from one string to another), strcmp (compare two strings byte by byte). All of them rely on the null terminator. If you forget to null-terminate, these functions read off the end of your buffer and into whatever happens to be next in memory.

Why this design choice

You might wonder why C does not store the length explicitly, the way Python or Java does. The answer is that C strings cost no extra memory beyond the characters and the null byte, and the design pre-dates languages that had the luxury of using more memory. Modern languages all use length-prefixed strings. The trade-off makes sense now. C is what we still use when we need to talk to the OS or to network protocols, both of which often use null-terminated strings.

Test yourself

Why is strlen on a C string an O(n) operation, whereas len() on a Python string is O(1)?

28. Recursion: the stack remembers

A recursive function is one that calls itself. With everything you now know about stack frames, recursion should be entirely demystified.

int factorial(int n) {
    if (n <= 1) return 1;
    return n * factorial(n - 1);
}

When you call factorial(5), the function pushes a new stack frame. Inside it, n is 5, the test fails, and the function calls factorial(4). Pushing another stack frame, this time with n = 4. And another, and another, all the way down to factorial(1), which returns 1 immediately. Then the unwinding begins: each frame multiplies its n by the returned value and pops, returning to the frame above.

There is no magic in recursion. It is just function calls. Each call gets its own frame, with its own n, and the stack remembers all the in-progress calls. When the base case returns, the stack unwinds in reverse order.

Stack overflow

The call stack is a finite block of memory, usually a few megabytes. Recurse too deep and you run out, and the OS kills the program with a "stack overflow" error. factorial(100000) will overflow on most systems. factorial(100) will not. This is a constraint that interpreted languages also have, but in C the limit is often lower because each frame may include more local variables.

Test yourself

Sketch what the call stack looks like, frame by frame, just before factorial(1) returns 1.

29. Stack, heap, malloc, free

So far every variable in our C programs has either lived in a register or on the stack. Both have a key limitation: their lifetime is tied to the function that created them. The moment the function returns, the memory is reused. If you want a piece of data that outlives the function that created it, you need a different region of memory.

That region is the heap. Unlike the stack, the heap is managed explicitly by the program. You ask for memory with malloc, and you give it back with free.

#include <stdlib.h>

int *make_array(int n) {
    int *p = malloc(n * sizeof(int));
    if (p == NULL) return NULL;          // malloc can fail
    for (int i = 0; i < n; i++) p[i] = i * 10;
    return p;                                // safe! p points at heap memory
}

int main(void) {
    int *arr = make_array(5);
    // ... use arr ...
    free(arr);                                  // give the memory back
    return 0;
}

The pointer returned by malloc stays valid until free is called on it. You can pass it between functions, return it, store it in a struct. The lifetime is now under your control instead of being tied to a function call.

The two ways this goes wrong

Memory leak. You malloc and never free. The OS does not reclaim heap memory automatically, that is C, not Python. Long-running programs that leak end up exhausting RAM.

Use after free. You free a pointer, then keep using it. The memory might still hold the old data for a while, but malloc is free to hand it out to the next caller. Now two parts of the program are stomping on each other's data, and the bugs are subtle and irreproducible.

Modern languages solve these with garbage collection (Python, Java) or compile-time ownership tracking (Rust). C makes you do it by hand. The cost of getting it right is a real engineering discipline, but the upside is that you know exactly when every byte is allocated and freed.

Test yourself

Why is it safe for make_array to return a pointer to heap memory, but unsafe to return a pointer to a local stack array?

Part VIII. Real programs

From the small parts to a working whole.

You now have everything you need to read and write real C. The remaining modules cover the conventions and standard-library tools that turn a collection of features into a real program.

30. I/O in C: printf, scanf, files

The standard library has a small set of functions for reading and writing. The two you have already met:

printf("x = %d, name = %s\n", x, name);
scanf("%d", &x);    // reads an int from standard input into x

Both use format strings. Each % in the format string is a placeholder for one argument. %d is "decimal integer," %s is "string," %f is "float," %x is "hexadecimal," %c is "single character." There are dozens of these. The compiler does not check that your format string matches the arguments. If you say %d and pass a string, the program reads garbage. printf is a famous source of bugs for this reason; modern compilers issue warnings if they can.

For files, the corresponding functions are fopen, fprintf, fscanf, fread, fwrite, fclose. Open a file with FILE *f = fopen("name.txt", "r");, read with fread or fscanf, close with fclose. Always check the return value of fopen. It returns NULL if the open failed.

Underneath, all of these eventually issue system calls (read, write, open, close). TRAPs into the OS, which does the actual hardware work. printf is a thin layer over those system calls that adds the formatting and a small buffer.

Test yourself

Why does scanf("%d", &x) need an & in front of x, but printf("%d", x) does not?

31. Structs and linked data

A struct is a way to group several values of different types into one named thing. Structs are how you build records. A person with a name and an age, a point with x and y, a packet with a header and a body.

struct point {
    int x;
    int y;
};

struct point p1 = { 3, 4 };
int sum = p1.x + p1.y;

A struct is laid out in memory as the concatenation of its fields. struct point is 8 bytes, one int after another. The field names are the compiler's way of remembering offsets: p1.x is "the int at offset 0 of p1," p1.y is "the int at offset 4 of p1." Once compiled, no field names exist; everything is offsets and sizes.

Pointers to structs

You will pass structs around mostly by pointer, because passing them by value copies every field, and structs can be large.

void print_point(struct point *p) {
    printf("(%d, %d)\n", p->x, p->y);
}

The arrow operator p->x is shorthand for (*p).x. Dereference the pointer, then take the field. It is the most common operator in idiomatic C; you will see it on every other line of real C code.

Test yourself

If struct point is laid out as two ints, and an int is 4 bytes, what is the offset of field y within the struct? What is the size of the whole struct?

32. Linked lists in practice

A struct can contain a pointer to another struct of its own type. That is the building block of every dynamic data structure in C.

struct node {
    int value;
    struct node *next;     // pointer to the next node, or NULL
};

struct node *head = malloc(sizeof(struct node));
head->value = 10;
head->next  = malloc(sizeof(struct node));
head->next->value = 20;
head->next->next  = NULL;

That is a linked list. Each node points to the next; the last node has a NULL next pointer. To walk the list:

struct node *cur = head;
while (cur != NULL) {
    printf("%d\n", cur->value);
    cur = cur->next;
}

Linked lists illustrate the key trade-off between contiguous and pointer-based data structures. An array gives you O(1) random access but O(n) insertion in the middle. A linked list gives you O(1) insertion but O(n) random access. Modern hardware makes arrays faster than this analysis suggests because of cache locality. The CPU loads memory in chunks, and contiguous data plays nicely with that. Linked lists scatter data across memory, which is cache-unfriendly. Most real production C code uses arrays unless there is a very specific reason not to.

Test yourself

To free a linked list correctly, in what order do you need to free its nodes? Why?

33. From C to assembly: what the compiler does

Take a C function and run gcc -S on it. The compiler will write out the assembly it generates, instead of going all the way to a binary. Read it. The exercise is profoundly useful, because every C feature you have learned now corresponds to a small chunk of assembly you can recognise.

// add.c
int add(int a, int b) {
    return a + b;
}

On x86-64, with a typical compiler, gcc -S add.c produces something close to:

add:
    leal  (%rdi, %rsi), %eax     ; eax = rdi + rsi
    ret                          ; return (eax holds the return value)

Two instructions. The first argument a arrived in register rdi, the second b in rsi. The result goes in eax by convention. The ret instruction pops the return address off the stack and jumps to it.

For more complex functions you will see locals being allocated on the stack (sub $16, %rsp reserves 16 bytes), values being read and written through pointers, function calls following the calling convention, and at higher optimisation levels, surprising rewrites where the compiler removes work it can prove is unnecessary.

Why this exercise pays off

Reading the compiler's output is the fastest way to develop a true intuition for performance. You see exactly which lines of C compile to lots of instructions and which compile to almost nothing. You see why i++ in a loop is sometimes free and sometimes not. You see what the compiler can optimise and what it cannot. You also notice undefined behaviour, when the compiler decides your code is unreachable and removes it.

Test yourself

If you compile a function with -O0 (no optimisation) and again with -O2, the assembly will look very different. Why do you read -O0 output to learn the language but -O2 output to understand performance?

34. Reading a real C codebase

The final test of this note is the same as the final test of any introduction to a language: pick up code that someone else wrote and read it. A few exercises that will exercise everything in this note.

Read the source of strcpy, the C standard library function that copies one string to another. The reference implementation is four lines, and every line uses pointers, dereferences, post-increment, and the null terminator. Predict what each line does before you read the next.

Read the first few hundred lines of a small C project on GitHub. Good candidates: tcc (a tiny C compiler, surprisingly readable); the sqlite source (one massive C file, with extensive comments); the source of curl (a network tool you have probably used). Notice the patterns. Error handling with return codes, ownership annotated by comments, structs passed by pointer, careful null checks.

Run a debugger. Compile a program with -g, run it under gdb or lldb, set breakpoints, step through. Print the values of registers. Print memory at addresses. Watch the stack grow and shrink as functions are called. Everything you learned in this note is visible from inside a debugger. It is the first place you will see how the abstractions you have built map onto a real running program.

What you should be able to do now

  • Take any byte and explain three different things it could mean.
  • Add and subtract two's complement numbers without converting to decimal.
  • Sketch the gates inside a 1-bit adder.
  • Trace an instruction from fetch to writeback.
  • Write a small assembly program for the LC-3, by hand, and trace its execution.
  • Explain what a TRAP, an interrupt, and an exception each are, and how they differ.
  • Write a C program that allocates and frees memory correctly.
  • Read C code that uses pointers and explain what it does.
  • Compile a small function and read the assembly the compiler produced.

That is the full list of objectives for this note. If most of them feel comfortable, you are done. If a few feel shaky, the relevant module is at the top of this page.

Final prompt

In one paragraph: explain what happens, from the moment a key is pressed on the keyboard to the moment the corresponding character appears in a running C program waiting on scanf. Use as many concepts from this note as you can. Gates and registers, the instruction cycle, memory-mapped I/O, interrupts, TRAPs, the stack, and pointers.