From Python to Production: How an AI-Powered Web App Actually Works

Part 0

Welcome

By the end of this note you will be able to read a real codebase that combines a Python backend, a TypeScript frontend, a Postgres database, large language models, and a deployment pipeline, and explain to a senior engineer how each piece fits.

The trick to learning this stack is not to memorise tools. The trick is to learn the small set of ideas that those tools repeatedly express. Once you see the ideas, every framework looks like a variation on the same theme.

How to use this note

Each module follows the same shape. First we name the problem the technology was invented to solve. Then we draw a mental model with a diagram. Then we write or read enough code to make the model concrete. Then we name the trade-offs. Then there is a prompt at the end that you should answer aloud, in your own words, before moving on.

If you cannot answer the prompt, the next module will not stick. Reread the diagram, reread the code, and try again. The note is designed so the ideas compound. Skipping is expensive.

A note on tools

Every code example in this note works as written. Where a snippet would only work with extra setup, the surrounding text says so explicitly. Where a snippet is illustrative pseudocode, it is labelled illustrative.

Part I. How the web actually works

The shape of a web application

Before any code, you need a single mental picture of where a web application lives and what travels between its pieces. Almost every problem in web development is a variation on the question, "which part is doing the work, and what is it telling the other parts."

1. The browser, the server, and what travels between them

A web application is a conversation between two computers. One is yours. The other is owned by the company running the app, somewhere in a data centre.

Your computer runs a program called a browser. The browser knows how to do three things: download files over the network, render those files into a visible page, and run a small programming language called JavaScript on the page.

The other computer runs a program called a server. The server listens on the network for requests and decides what to send back.

Two computers, talking. Every web app is a conversation in this shape.

What an HTTP request looks like

The conversation uses a protocol called HTTP. A request is just a piece of text the browser sends to the server. It looks roughly like this.

GET /companies/AMZN HTTP/1.1
Host: api.example.com
Accept: application/json
Authorization: Bearer eyJ...

The first line says, "I want to GET the resource at the path /companies/AMZN". The next lines are headers: small pieces of metadata. The server reads this, decides what to do, and sends a response.

HTTP/1.1 200 OK
Content-Type: application/json

{
  "ticker": "AMZN",
  "name": "Amazon.com, Inc.",
  "sector": "Consumer Cyclical"
}

The first line is a status: 200 means it worked. The body is the answer. In this case the body is in JSON, a text format for structured data that looks almost identical to a Python dictionary.

The four ideas hidden in this picture

Network calls are slow. A round trip across the world takes tens of milliseconds even when nothing is wrong. So good apps hide latency by being asynchronous and by caching results.
Anyone can send a request. The server cannot trust the browser. It must validate every input and check who the caller is. Most security work in web development is a consequence of this fact.
The browser cannot be trusted either, by anyone, ever. Code that runs in the browser is visible to the user. Secrets must stay on the server.
State must live somewhere. Servers can crash and restart. So real state lives in a database, not in server memory.

Common mistake

Beginners often think the browser and the server share variables. They do not. Every interaction is a brand new request. The server has amnesia by default. Anything it needs to remember it must look up.

Test yourself

Explain why a server needs to validate input even if the frontend already validated it.

2. HTML, the document tree

The browser does not know what a "page" is. It knows what a tree of nodes is. HTML (HyperText Markup Language) is the format used to describe that tree.

The simplest possible HTML page is this.

<!DOCTYPE html>
<html>
  <head>
    <title>Hello</title>
  </head>
  <body>
    <h1>Hello, world.</h1>
    <p>This is a paragraph.</p>
  </body>
</html>

Each thing in angle brackets is a tag. Tags come in pairs, one to open and one to close, and they nest. The whole document is a tree.

The HTML on the previous code block, drawn as the tree the browser actually builds.

The tree has a name: the DOM, which stands for Document Object Model. When you hear "DOM" later, it means this tree of nodes that the browser holds in memory.

Tags and attributes

Tags can carry attributes, small pieces of metadata that describe how the node behaves.

<a href="https://example.com">Visit</a>
<img src="cat.png" alt="A cat">
<input type="email" placeholder="you@example.com">

The browser only cares about a small list of standardised tags: paragraphs (p), headings (h1 through h6), links (a), images (img), forms and inputs, lists, tables, and a small set of semantic containers like main, section, and nav. That is most of it.

What HTML is not

HTML does not describe how things look. It describes structure and meaning. The browser has default styles (headings are big, paragraphs have spacing) but those are conventions, not part of the language.

HTML also does not have logic. It cannot react to clicks, change colour, or do arithmetic. That work is done by CSS and JavaScript, which we meet next.

Test yourself

What is the DOM, and how is it different from the HTML file the server sent?

3. CSS, painting the tree

HTML gives the browser a tree. CSS (Cascading Style Sheets) tells the browser how to draw it.

A CSS rule has two parts: a selector that picks nodes from the tree, and a declaration block that says what to do with them.

h1 {
  color: #1A1410;
  font-size: 32px;
  margin-bottom: 16px;
}

p {
  color: #4A4038;
  line-height: 1.6;
}

That is the entire syntax. The hard part of CSS is not the syntax. The hard part is the layout model: how the browser decides where things go on the screen.

The two layout models you need to know

For most modern UIs, you only need two layout systems.

Flexbox arranges children in a row or a column with rules for alignment and spacing.

.toolbar {
  display: flex;
  justify-content: space-between;  /* push items to the ends */
  align-items: center;             /* vertically centre them */
  gap: 12px;                       /* space between items */
}

Grid arranges children in a two dimensional grid with named tracks.

.dashboard {
  display: grid;
  grid-template-columns: 240px 1fr;  /* sidebar, main area */
  gap: 24px;
}

Almost every layout you see online is some combination of these two.

The cascade and specificity

Two CSS rules can match the same node and disagree. The browser resolves this with two rules.

Specificity. A more specific selector wins. p.intro beats plain p.
Order. If specificity ties, the rule written later in the file wins.

This is what the word "cascading" in CSS refers to. Rules cascade through the document and the most specific one wins.

You will not write CSS like this in production

In real applications, hand-written CSS files become unmanageable. Modern projects use one of two approaches.

The first is CSS-in-JS: you write style rules inside your component code. The second is utility classes, where you compose styles from a fixed set of single-purpose classes.

The second approach has won. The most common implementation is a tool called Tailwind. We will see it later.

/* Hand-written CSS */
.button {
  background: black;
  color: white;
  padding: 8px 16px;
  border-radius: 6px;
}

/* The same thing, with Tailwind utility classes */
<button class="bg-black text-white px-4 py-2 rounded-md">Save</button>

Tailwind looks ugly the first time you see it. Then you write a real app, change a colour, and discover you only had to change it in one place. The aesthetic argument loses the moment you maintain a codebase.

Test yourself

Given two CSS rules that both target the same element, how does the browser decide which one wins?

Part II. JavaScript, TypeScript, React

The frontend trinity

HTML is the structure. CSS is the look. JavaScript is the behaviour. Modern web apps rest on this trinity, plus two layers stacked on top: TypeScript, which adds types to JavaScript, and React, a library that lets you describe UI as a function of state.

4. JavaScript essentials for a Python person

JavaScript and Python share a heritage. Both are dynamically typed, both have first-class functions, both have list and dictionary literals. The differences are mostly cosmetic. What matters is a small set of mental shifts.

Variables

const name = "Pet Supplies Plus";   // cannot be reassigned
let count = 0;                       // can be reassigned
count = count + 1;

Use const by default. Use let only when the variable must be reassigned.

Functions

// classic syntax
function add(a, b) {
  return a + b;
}

// arrow syntax (more common in modern code)
const add = (a, b) => a + b;

Arrow functions are not just shorter syntax. They behave differently around this, but you usually do not care about this in modern code, so arrows are fine.

Objects and arrays

const company = {
  ticker: "AMZN",
  name: "Amazon.com",
  employees: 1500000,
};

// dot access or bracket access
company.ticker;
company["name"];

const tickers = ["AMZN", "GOOG", "MSFT"];
tickers.length;       // 3
tickers[0];           // "AMZN"

Async

JavaScript handles slow operations (network, files, timers) with a Promise, an object representing a value that may not exist yet.

async function getCompany() {
  const response = await fetch("https://api.example.com/companies/AMZN");
  const data = await response.json();
  return data;
}

The await keyword pauses the function until the Promise resolves. This is identical to asyncio.sleep or asyncio.gather in Python. The mental model is the same.

The DOM API

JavaScript has access to the document tree we met earlier. It can read it, change it, and listen for events.

const button = document.querySelector("button");
button.addEventListener("click", () => {
  alert("You clicked it.");
});

You will almost never write code like this directly. React replaces it with something much better. But it helps to know what is underneath.

Mental shift

In Python, you mostly run scripts that finish. In JavaScript in a browser, the program never ends. It sits in a loop, waiting for events: clicks, network responses, timers. Every line of frontend code is reactive.

Test yourself

What does `await` do, in one sentence, and why is it needed in JavaScript at all?

5. Why TypeScript exists

JavaScript is dynamically typed. A function written to take a number will silently accept a string and produce nonsense at runtime. In a 200 line script this is fine. In a 200,000 line application maintained by a team, it becomes ruinous.

TypeScript is JavaScript with optional type annotations. The annotations are checked by a separate compiler before the code runs. The compiler erases the annotations and emits plain JavaScript. The browser sees no difference.

// JavaScript
function total(price, quantity) {
  return price * quantity;
}

// TypeScript
function total(price: number, quantity: number): number {
  return price * quantity;
}

The TypeScript version refuses to compile if you call total("ten", 3). The runtime cost is zero because the types vanish before the code runs.

Why this matters more than you think

Types do three things at once.

They catch bugs at compile time instead of in production at 2am.
They are documentation that cannot drift. A function's signature tells you what it expects and what it returns. There is no need to read the body.
They drive editor tooling. Autocomplete, "go to definition", "find all usages", and refactor-rename all become reliable.

You feel the third one most. Once you have used a typed editor, working in plain JavaScript feels like writing in the dark.

Practical types you will see all the time

// primitive types
let ticker: string = "AMZN";
let employees: number = 1500000;
let isPublic: boolean = true;

// arrays
const tickers: string[] = ["AMZN", "GOOG"];

// object shapes (called "interfaces")
interface Company {
  ticker: string;
  name: string;
  employees: number;
  ipoDate: string | null;   // can be a string or null
}

function describe(c: Company): string {
  return `${c.name} (${c.ticker})`;
}

The pipe character (|) is a union: "this value is one of several types." You will use unions constantly to describe API responses where a field might be missing.

Test yourself

If TypeScript types are erased before the code runs, why are they not just comments?

6. React, a way of describing UI

The DOM is mutable. JavaScript can reach into the tree and change any node, anywhere, at any time. This sounds powerful. In practice it produces bugs that are nearly impossible to track down, because the same piece of UI can be modified from a dozen places.

React was invented to solve this. The core idea is that you describe what the UI should look like as a function of state, and React is responsible for changing the DOM to match.

State changes. React calls render. React updates the DOM to match. You never touch the DOM yourself.

A first component

A React component is just a function that returns markup. The markup looks like HTML but is actually called JSX: an extension to JavaScript that the build step converts to plain function calls.

function Greeting({ name }) {
  return <p>Hello, {name}.</p>;
}

// Used like this:
<Greeting name="Pethuel" />

The curly braces inside JSX let you embed arbitrary JavaScript. Anything between { and } is evaluated and inserted.

State with hooks

A component cannot remember anything by itself. To hold state across renders it uses a hook called useState.

import { useState } from "react";

function Counter() {
  const [count, setCount] = useState(0);

  return (
    <div>
      <p>Count is {count}</p>
      <button onClick={() => setCount(count + 1)}>Add one</button>
    </div>
  );
}

Reading the line: useState(0) initialises a state value to 0. It returns a pair: the current value and a function that updates it. When the user clicks the button, the update function fires, the state changes, React re-renders the component, and the DOM is patched. You never touch the DOM yourself.

Effects: when you need to do something at a specific time

Sometimes a component needs to do something when it first appears, or when a value changes. The hook for that is useEffect.

import { useEffect, useState } from "react";

function CompanyDetails({ ticker }) {
  const [company, setCompany] = useState(null);

  useEffect(() => {
    fetch(`/api/companies/${ticker}`)
      .then((r) => r.json())
      .then(setCompany);
  }, [ticker]);

  if (!company) return <p>Loading...</p>;
  return <h1>{company.name}</h1>;
}

The second argument to useEffect (the array) tells React which values to watch. The effect runs when the component first appears, and again whenever ticker changes.

Mental model

In React, you never describe the change. You describe the new state, and React figures out the change. This is what people mean when they say React is "declarative."

Test yourself

Why is it useful that React abstracts away DOM mutations? Name a class of bugs this prevents.

Part III. A backend, built

The server, from zero

The server is the part of a web application that lives on the data centre side of the diagram from Module 1. It listens for HTTP requests, decides what each one means, talks to a database, calls external services like a language model, and returns a response.

This part builds your understanding of a backend from the ground up: the protocol, the framework, the type system, the async model, and the patterns for slow work.

7. HTTP, REST, and APIs

An API (Application Programming Interface) is a contract: a set of named operations the server promises to support, and the format of the inputs and outputs.

For web apps, that contract is almost always expressed as HTTP endpoints. Each endpoint is identified by a method and a path.

Method	Means	Used for
`GET`	Read	Fetching data, never changes anything
`POST`	Create	Creating a new thing
`PATCH`	Update	Modifying part of an existing thing
`PUT`	Replace	Replacing a thing entirely
`DELETE`	Delete	Removing a thing

REST: a convention for naming endpoints

REST is a style for designing HTTP APIs. The core idea: the path identifies a resource, and the method identifies what you want to do with it.

GET    /companies            # list all companies
POST   /companies            # create a new one
GET    /companies/AMZN       # read one specific company
PATCH  /companies/AMZN       # update fields of one company
DELETE /companies/AMZN       # remove it

This convention is so widespread that experienced developers can guess most of an API just from one or two examples.

Status codes

Every response carries a numeric status. You will use a small handful constantly.

Code	Means	When
`200`	OK	Everything is fine, here is your data
`201`	Created	The new thing was made
`400`	Bad Request	Your input is malformed
`401`	Unauthorized	You did not prove who you are
`403`	Forbidden	You proved who you are, but you cannot do this
`404`	Not Found	The resource at that path does not exist
`500`	Server Error	The server crashed while handling your request

JSON: how the body is shaped

The body of a request or response is usually JSON. JSON looks identical to a Python dictionary, plus arrays.

{
  "id": "1c7f-...",
  "ticker": "AMZN",
  "financials": [
    { "year": 2024, "revenue_usd_m": 638000 },
    { "year": 2025, "revenue_usd_m": 716920 }
  ]
}

Keys are strings. Values are strings, numbers, booleans, null, arrays, or nested objects. That is the whole format.

Test yourself

What status code should you return when a user is logged in but tries to access a workspace they are not a member of?

8. FastAPI, a server in 30 lines

To build an HTTP server in Python you need a framework. FastAPI is the modern default: it is fast, type-driven, and produces an interactive documentation page automatically.

Here is a complete server, including a route that lists companies and a route that fetches one.

from fastapi import FastAPI, HTTPException

app = FastAPI()

COMPANIES = {
    "AMZN": {"name": "Amazon.com, Inc.", "sector": "Consumer Cyclical"},
    "GOOG": {"name": "Alphabet Inc.", "sector": "Communication Services"},
}

@app.get("/companies")
def list_companies():
    return [{"ticker": t, **c} for t, c in COMPANIES.items()]

@app.get("/companies/{ticker}")
def get_company(ticker: str):
    if ticker not in COMPANIES:
        raise HTTPException(404, f"unknown ticker: {ticker}")
    return {"ticker": ticker, **COMPANIES[ticker]}

Save that to main.py and run uvicorn main:app --reload. You now have a real HTTP server. Visit http://localhost:8000/companies in a browser and you will see JSON.

How FastAPI maps a request to a function

The decorator @app.get("/companies/{ticker}") tells FastAPI: when a GET request arrives at a path matching this pattern, call this function. The piece of the path inside curly braces becomes a parameter.

FastAPI parses the path, type-coerces the parameter to str, and calls your function.

The killer feature: automatic docs

FastAPI inspects your type annotations and generates an interactive API documentation page at /docs. You can call your endpoints from the browser, see the schemas, and copy the request as curl. This is one of the reasons it took over the Python web world.

Dependency injection

Real handlers do not just return literals. They need a database connection, a logged-in user, and configuration. FastAPI handles this with dependencies: small functions that produce the things your handler needs.

from fastapi import Depends

def get_db():
    db = open_database_connection()
    try:
        yield db
    finally:
        db.close()

@app.get("/companies/{ticker}")
def get_company(ticker: str, db = Depends(get_db)):
    return db.fetch_one("SELECT * FROM companies WHERE ticker = ?", ticker)

FastAPI sees Depends(get_db) in the signature, calls get_db for you, passes the result, and runs the cleanup after the handler returns. This pattern is everywhere in real code.

Test yourself

If FastAPI types parameters automatically, what should happen if a client sends `GET /items/abc` when the handler expects an integer?

9. Pydantic, schemas as truth

Routes are easy. The hard problem in a web backend is making sure that every value flowing in or out of the system has the shape you expect. Pydantic is the library that makes this easy in Python.

A Pydantic model is a Python class with annotated fields. It validates input, coerces types, and serialises back to JSON.

from pydantic import BaseModel, Field
from datetime import datetime

class Company(BaseModel):
    ticker: str
    name: str
    sector: str | None = None
    employees: int = Field(ge=0)        # must be >= 0
    ipo_date: datetime | None = None

A few things in this small example are doing a lot of work.

str | None = None means the field is optional; if missing it defaults to None.
Field(ge=0) adds a constraint: greater than or equal to zero. Pydantic will refuse to accept a negative number.
datetime means Pydantic will accept an ISO date string from JSON and convert it into a Python datetime automatically.

Pydantic with FastAPI

FastAPI is built on Pydantic. When you declare a request body of a Pydantic type, validation happens for free.

class CreateCompanyRequest(BaseModel):
    ticker: str = Field(min_length=1, max_length=10)
    name: str

@app.post("/companies", response_model=Company)
def create_company(body: CreateCompanyRequest):
    # body is already validated. body.ticker is a str. body.name is a str.
    return Company(
        ticker=body.ticker.upper(),
        name=body.name,
        employees=0,
    )

If the client sends {"ticker": "", "name": "X"}, FastAPI returns a structured 422 error before create_company is even called. The handler can assume its inputs are valid. This single property eliminates a huge category of bugs.

Why this is a big deal

In a typical Python web app, half the bugs come from the boundary between the outside world and the program. The data does not arrive in the shape you expected. Pydantic moves all that uncertainty to a single place: the schema. The rest of your code can rely on the types.

Key insight

Treat your Pydantic schemas as the source of truth for the shape of your data. Database models, API contracts, and frontend types should all derive from or align with these schemas. This idea, "one schema, many places," is what keeps a complex app coherent.

Test yourself

What error do you expect FastAPI to return if a client posts a body where `ticker` is an integer instead of a string?

10. Async Python: what await actually does

Python has two flavours of function. The synchronous kind (the kind you already know) runs from top to bottom. The asynchronous kind (introduced by the async keyword) can pause itself and let something else run while it waits.

This matters because a web server spends most of its time waiting on slow things: a database query, an HTTP call to another service, a file read. If the server were synchronous, it could only handle one request at a time. With async, one process can handle thousands of requests concurrently because most of them are waiting.

# Sync version. Blocks the whole process during the network call.
def fetch_company(ticker: str):
    response = httpx.get(f"https://api.example.com/{ticker}")
    return response.json()

# Async version. The await releases the process while the request is in flight.
async def fetch_company(ticker: str):
    async with httpx.AsyncClient() as client:
        response = await client.get(f"https://api.example.com/{ticker}")
    return response.json()

The shape of the code is almost identical. The difference is in what happens at await: control yields back to the event loop, which can run other coroutines while the network call is pending.

Async lets a single process interleave many requests. The CPU is busy. The waits run in parallel.

Two rules to keep this simple

You can only await inside an async function. If you try to await in a regular function, Python is a syntax error.
An async function called without await does nothing useful. It returns a coroutine object (a piece of work that has not started yet). To run it, something must await it, or pass it to asyncio.run().

The big mistake

Calling a synchronous, blocking function (like reading a large file or sleeping with time.sleep) inside an async handler. The whole event loop stops. Suddenly your server cannot handle any other requests until the blocking call returns.

Solution: use the async version of the library. httpx.AsyncClient instead of requests. asyncio.sleep instead of time.sleep. aiofiles instead of plain open.

Test yourself

What happens if an async FastAPI route calls a synchronous database driver that blocks for two seconds while a hundred users are hitting the server?

11. Background tasks and schedulers

Some work is too slow to do during a request. If you generate a multi-page report that takes 30 seconds, you cannot make the user's browser wait. You return a fast response that says "we are working on it" and do the slow work in the background.

Two patterns for background work

Pattern 1: in-process background tasks. The work runs in the same process as the API. FastAPI ships with this built in.

from fastapi import BackgroundTasks

@app.post("/documents/upload")
async def upload(file: UploadFile, tasks: BackgroundTasks):
    document_id = await save_to_disk(file)
    tasks.add_task(parse_and_embed, document_id)
    return {"id": document_id, "status": "processing"}

async def parse_and_embed(document_id):
    # slow work. parse the file, chunk it, generate embeddings, write to DB.
    ...

This pattern is fine for small, fast tasks. It is bad for long, expensive ones because if the server restarts, the task is lost.

Pattern 2: a job queue. Work is written into a special queue (in Postgres or Redis) and picked up by a separate worker process. If the API server restarts, the work is still in the queue. If the worker crashes, the next worker picks it up.

The queue is the boundary between fast work (handling requests) and slow work (processing a document).

Schedulers: work that runs on a clock

Some work is not user-driven at all. Re-scoring an investment thesis every night, refreshing data from an external API every hour, sending a weekly digest at 8am: these need a scheduler.

A simple scheduler is just an async loop that wakes up periodically and asks, "is there any work due right now?"

async def scheduler():
    while True:
        due = await find_jobs_due_now()
        for job in due:
            await run(job)
        await asyncio.sleep(60)   # check every minute

That is the entire pattern. Real schedulers add error handling, retries, and metrics, but the core is a loop that checks the clock.

Test yourself

You need to send a daily summary email to every active user. Should you use background tasks, a job queue, or a scheduler? Why?

Part IV. Databases

Where the truth lives

The server has amnesia. Real applications need a place that remembers everything. That place is a database.

This part covers the language databases speak, how Python apps talk to them safely, how to evolve their structure over time without losing data, and how Postgres goes far beyond rows of text and numbers.

12. SQL, the language of structured data

A relational database stores data in tables. Each table has a fixed set of named columns. Each row in the table has a value for each column.

id	ticker	name	employees
1	AMZN	Amazon.com, Inc.	1,500,000
2	GOOG	Alphabet Inc.	183,000
3	NVDA	NVIDIA Corporation	29,600

SQL (Structured Query Language) is the language used to describe what you want to read or write.

-- read
SELECT ticker, name FROM companies WHERE employees > 100000;

-- write
INSERT INTO companies (ticker, name, employees)
VALUES ('MSFT', 'Microsoft Corporation', 221000);

-- update
UPDATE companies SET employees = 230000 WHERE ticker = 'MSFT';

-- delete
DELETE FROM companies WHERE ticker = 'MSFT';

Four verbs. SQL has more, but you will use these four most.

The two ideas that make SQL powerful

The first is foreign keys. A column in one table can reference a row in another. This is how you express relationships.

CREATE TABLE filings (
  id        UUID PRIMARY KEY,
  company_id UUID REFERENCES companies(id),
  form_type TEXT,
  filed_at  DATE
);

The second is JOINs: querying across multiple tables.

SELECT c.name, f.form_type, f.filed_at
FROM companies c
JOIN filings f ON f.company_id = c.id
WHERE c.ticker = 'AMZN'
ORDER BY f.filed_at DESC;

This pulls the company and all its filings together in a single query. The database does the heavy lifting; your code receives flat rows.

Indexes

By default a database scans every row to find matches. An index is a sorted lookup structure on one or more columns that turns linear scans into logarithmic lookups.

CREATE INDEX idx_companies_ticker ON companies(ticker);

Now WHERE ticker = 'AMZN' finds the row in microseconds even on a million-row table. The trade-off: indexes take up space and slow down inserts. Add them where you query, not where you write.

Read this carefully

SQL is more important than any single backend framework you will learn. FastAPI may go out of fashion. SQL has been the language of structured data for fifty years. Time spent learning SQL compounds for your entire career.

Test yourself

What is the difference between a primary key and a foreign key, and what happens if you delete a row that another row's foreign key points to?

13. SQLAlchemy and the async ORM

You can run SQL strings directly from Python. It works. It is also dangerous (string concatenation invites SQL injection) and tedious (turning rows back into Python objects by hand).

An ORM (Object-Relational Mapper) maps SQL rows to Python objects and vice versa. SQLAlchemy is the standard ORM in Python.

from sqlalchemy.orm import Mapped, mapped_column, DeclarativeBase
from sqlalchemy import Text, Integer
import uuid

class Base(DeclarativeBase): pass

class Company(Base):
    __tablename__ = "companies"

    id: Mapped[uuid.UUID] = mapped_column(primary_key=True)
    ticker: Mapped[str] = mapped_column(Text, unique=True)
    name: Mapped[str] = mapped_column(Text)
    employees: Mapped[int] = mapped_column(Integer, default=0)

Now Python can read and write companies as objects.

company = Company(ticker="AMZN", name="Amazon.com, Inc.", employees=1500000)
session.add(company)
await session.commit()

# later
result = await session.execute(
    select(Company).where(Company.ticker == "AMZN")
)
amzn = result.scalar_one()
print(amzn.name)

SQLAlchemy generates the SQL for you. It also handles parameter binding correctly, which makes SQL injection impossible by default.

The session

A session is a unit of work. You open one at the start of a request, use it to read and write, and commit or roll back at the end.

If you commit, the database accepts every change as a single atomic transaction. If anything goes wrong before the commit, you roll back and nothing is saved. This is a guarantee called ACID: either all of the changes happen, or none of them do.

Async ORM

Modern SQLAlchemy supports the async API discussed in Module 10. You write the same kind of code, but every database call is awaited.

async with AsyncSession(engine) as session:
    result = await session.execute(select(Company))
    companies = result.scalars().all()

Combine this with FastAPI's dependency injection and you get a clean pattern: every request gets its own session, automatically cleaned up.

Test yourself

Why does using an ORM eliminate the most common kind of SQL injection bug?

14. Migrations: schema as code

Real applications change. You add a column, you split a table, you rename something. The schema is not static.

The wrong way to manage schema changes: log into the database and run ALTER TABLE by hand. This works for one developer for one minute. It falls apart the moment you have a teammate, a staging environment, or a production database.

The right way: migrations. Each change is a small, dated file that describes how to evolve the schema. Migrations are run in order, and the database remembers which ones have been applied.

-- supabase/migrations/20260423000010_add_companies_table.sql
CREATE TABLE companies (
  id        UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  ticker    TEXT UNIQUE,
  name      TEXT NOT NULL,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- supabase/migrations/20260508000010_add_signup_requests.sql
CREATE TABLE signup_requests (
  id     UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  email  TEXT NOT NULL,
  status TEXT NOT NULL DEFAULT 'pending'
);

Each migration has a timestamp prefix so the order is unambiguous. The database has a small internal table tracking which filenames it has run. New ones are applied automatically; old ones are skipped.

Why this matters

You can spin up a database from scratch deterministically: run all migrations in order, get a known schema.
Schema changes go through code review like any other code.
Production rollbacks are possible: write a "down" migration that undoes a change.

Most importantly, the schema lives in your repository. Anyone reading the code can see how the database is structured without logging in to inspect it.

Common mistake

Editing an existing migration after it has been run anywhere. Once a migration has been applied to any environment, it is frozen. To change something, write a new migration. Editing an old one breaks the contract that migrations are an append-only ledger.

Test yourself

Why is keeping schema changes in code more important than keeping application code in code?

15. Postgres beyond rows

Postgres is not just a relational database. It has a set of features that turn it into a small operating system for data. Three of them matter for modern AI apps.

JSONB: structured but flexible data

A column of type JSONB stores arbitrary JSON. You can query inside it, index parts of it, and update fields without rewriting the whole document.

CREATE TABLE briefs (
  id        UUID PRIMARY KEY,
  sections  JSONB NOT NULL,
  ...
);

-- query inside the JSON
SELECT id FROM briefs WHERE sections->>'title' ILIKE '%Amazon%';

This is useful when the shape of data varies. A research brief has sections, each with a title and body, but the number of sections is not fixed. JSONB stores the whole structure in one column.

Full-text search

Postgres can index text for natural language search.

SELECT id, body
FROM notes
WHERE to_tsvector('english', body) @@ to_tsquery('english', 'tariff & risk');

This is a real search engine inside the database. Stemming, stop words, ranking by relevance: all included.

pgvector: similarity search for embeddings

This is the killer feature for AI apps. pgvector is an extension that adds a vector column type and operators for similarity search.

CREATE EXTENSION vector;

CREATE TABLE document_chunks (
  id        UUID PRIMARY KEY,
  text      TEXT,
  embedding vector(384)   -- a 384-dimensional vector
);

-- find the chunks most similar to a query embedding
SELECT id, text
FROM document_chunks
ORDER BY embedding <=> '[0.1, 0.2, ...]'::vector
LIMIT 10;

The <=> operator is cosine distance. The query returns the chunks closest in vector space to the query, which (when the embeddings were generated correctly) are the ones most similar in meaning.

We will revisit pgvector in Part VI when we build retrieval-augmented generation. For now, just know that "find similar text" is a single SQL query when your database speaks vectors.

Test yourself

Name three reasons to keep your data in one Postgres database rather than splitting structured data, search, and vectors across three specialised systems.

Part V. Authentication and security

Knowing who is asking

Every server problem eventually becomes "who is making this request, and what are they allowed to do?" This part teaches the small set of mechanisms that answer that question reliably.

16. How web auth actually works

Web authentication is a sequence of moves the browser and the server perform together. Strip away the buzzwords and it is two questions, asked in order.

Authentication. Who are you? Prove it.
Authorisation. Are you allowed to do this thing?

The first one happens once, at sign-in. The second one happens on every request.

The classic flow with passwords

The user types an email and a password into a form. The browser sends them to the server.
The server hashes the password and compares it to the hash it stored earlier. If they match, the user is who they claim to be.
The server gives the browser a token: a long, random, hard to forge string. The browser stores it.
For every subsequent request, the browser includes the token in a header. The server reads it and looks up which user it represents.

The token is a stand-in for "I already proved who I am." It is what saves the user from logging in on every click.

Why hashing matters

The server never stores raw passwords. It stores a one-way hash (a fingerprint). When the user signs in, the server hashes the submitted password and compares fingerprints. If the database is stolen, the attacker has hashes, not passwords.

The hash function used must be slow on purpose (bcrypt, argon2). A fast hash lets attackers try billions of guesses per second. A slow hash makes that impractical.

Never do this

Store passwords in plain text. Use a fast hash like SHA256 to "encrypt" them. Roll your own crypto. Each of these is a line of code that ends careers. Use a battle-tested library.

Test yourself

Why is the token stored in the browser, not in a session table on the server, in modern apps?

17. JWTs, sessions, and Supabase

The token in the previous module can take two shapes.

Session token. A random string. The server keeps a table mapping tokens to users. To check who is making a request, look up the token in the table.

JWT (JSON Web Token). A signed JSON object that carries the user's identity inside it. To check it, the server verifies the signature; no database lookup needed.

A JWT looks like three Base64-encoded segments separated by dots.

eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.
eyJzdWIiOiJ1c2VyXzAxIiwiZW1haWwiOiJqQGV4YW1wbGUuY29tIn0.
mO5MYZfXnYvI4cRcW1Xj4cM5tXq2mw5NzJ5kVNiGV7M

The first segment is a header. The second is the payload. The third is the signature. The signature is computed from the first two segments and a secret only the server knows. If anyone changes the payload, the signature no longer matches.

Why JWTs are popular

The server can verify a JWT without a database lookup. This means stateless servers: any instance can handle any request without sharing memory. It scales well horizontally.

The downside

Once a JWT is issued, you cannot revoke it without extra machinery. If a token is stolen, it remains valid until it expires. Workarounds (short expiry plus a refresh token, or a denylist) add complexity.

Supabase

Supabase is a managed Postgres + auth + storage service. Most importantly for this note, it implements the JWT flow above and gives you a small library for both browser and server.

In the browser:

const { data, error } = await supabase.auth.signInWithPassword({
  email: "you@example.com",
  password: "correct horse battery staple",
});

// supabase stashes the JWT in localStorage automatically
// every subsequent request can include it

On the server:

from jwt import decode, PyJWKClient

jwks = PyJWKClient(f"{SUPABASE_URL}/auth/v1/.well-known/jwks.json")

def verify(token: str) -> dict:
    key = jwks.get_signing_key_from_jwt(token).key
    return decode(token, key, algorithms=["ES256"], audience="authenticated")

That is the whole verification step. Read the token, fetch the public key from Supabase, check the signature. If anything is off, the function raises and the request is rejected.

Test yourself

If a JWT cannot be revoked easily, how does a logged-in user "log out" in practice?

18. Row-level security

Authentication tells you who is asking. Authorisation tells you what they are allowed to see and do. The naive way to enforce authorisation is in your application code: every handler checks "does this user own this thing?" before reading or writing.

This works until you have ten endpoints, then a hundred, and one developer forgets a check, and now any signed-in user can read any other user's notes.

Row-level security (RLS) moves the check into the database itself. The database refuses to return rows the current user is not allowed to see, no matter what query is sent.

ALTER TABLE notes ENABLE ROW LEVEL SECURITY;

CREATE POLICY notes_owner_only ON notes
  FOR SELECT
  USING (workspace_id IN (
    SELECT workspace_id FROM workspace_members
    WHERE user_id = auth.uid()
  ));

Read this policy as: "you can SELECT a row from notes only if its workspace_id appears in the list of workspaces you are a member of."

The function auth.uid() returns the user id from the JWT attached to the current connection. Supabase wires this in automatically.

Why this is a big deal

RLS turns a class of bug from "easy mistake" into "physically impossible." Even if a developer ships an endpoint that forgets to check the workspace, the database refuses to return rows the user does not own.

The trade-off

Logic now lives in two places: SQL policies and application code. You need to think clearly about which checks belong where. The convention is straightforward: anything that the database can express (ownership, membership) lives in policies. Anything that requires application context (rate limits, business rules) lives in code.

A pattern worth internalising

For every workspace-scoped table, write four policies: SELECT, INSERT, UPDATE, DELETE. SELECT and DELETE almost always check the same condition. INSERT and UPDATE often have stricter rules (must be a member with write permission, not just a viewer). Doing this consistently across a schema is one of the markers of careful work.

Test yourself

Suppose a developer accidentally writes `SELECT * FROM notes` with no `WHERE` clause and exposes it through an endpoint. With RLS enabled, what happens?

Part VI. The AI plumbing

Connecting a model to your app

A large language model is just another network service. It takes text in, returns text out. The interesting work is what happens around it: how you prompt it, how you ground its answers in your data, and how you let it use your tools.

This is the part that turns an ordinary CRUD app into something that feels intelligent. It is also the most over-mystified topic in modern engineering. The fundamentals are simple.

19. LLMs as a server you can call

Forget the magic for a minute. A large language model like Claude is a program running on someone else's computer that takes a list of messages and returns the next message in the conversation. You call it the same way you call any other API.

import anthropic

client = anthropic.Anthropic(api_key=API_KEY)

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "What is the capital of Germany?"}
    ],
)

print(response.content[0].text)   # "Berlin."

That is the whole call. The library handles authentication; you handle the inputs and outputs.

The shape of a conversation

A conversation is a list of messages. Each message has a role (either "user" or "assistant") and a content. To continue a conversation, you send the whole history every time.

messages = [
    {"role": "user", "content": "What is the capital of Germany?"},
    {"role": "assistant", "content": "Berlin."},
    {"role": "user", "content": "And France?"},
]

response = client.messages.create(model="claude-sonnet-4-6", max_tokens=1024, messages=messages)
# "Paris."

The model has no memory between calls. The "memory" in a chat product is just the application keeping the message list and resending it each turn. This is why people say models are "stateless." The state is on your side, not theirs.

The system prompt

You can give the model a high-priority instruction that sits above the conversation, called the system prompt.

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    system="You are a senior research analyst. Answer concisely, cite sources.",
    messages=[{"role": "user", "content": "Summarise Amazon's Q1 2026."}],
)

The system prompt is where you put role definitions, output format requirements, and constraints. It applies to the whole conversation.

Tokens, latency, and cost

Models charge by tokens: small units roughly the size of a syllable. A typical English word is about 1.3 tokens. Each call has a token cost for the input and a separate (usually higher) cost for the output.

This shapes engineering decisions. If you can answer a question by sending one paragraph of context, do not send ten. Trim, summarise, and pick relevant pieces. Most of the work in building good AI features is reducing what you send.

Test yourself

If the model has no memory between calls, how does a chat product appear to remember the conversation?

20. Prompt engineering and structured outputs

A prompt is just text. Prompt engineering is the practice of writing that text so the model produces useful, predictable output.

The whole field can be summed up as: be specific. The model is a very good autocomplete. Tell it exactly what role to play, exactly what format to produce, and exactly what to avoid.

The four ingredients of a good prompt

Role. "You are a senior research analyst at a private equity firm."
Task. "Read the document excerpts below and answer the question."
Constraints. "Cite each fact with [N] markers. If the answer is not in the documents, say so."
Examples. "Here is the question. Here is the document. Here is the kind of answer we want."

Constraints are the most underrated. Most prompt failures are caused by the model doing something the prompt did not forbid.

Structured outputs

Plain text answers are hard for code to consume. Modern models support structured outputs: you describe a JSON schema and the model is constrained to produce JSON that fits.

# Pydantic schema doubles as the output schema
class BriefSection(BaseModel):
    title: str
    body: str
    citation_indices: list[int]

class Brief(BaseModel):
    sections: list[BriefSection]

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=4096,
    system="Draft an investment memo. Return JSON matching the Brief schema.",
    messages=[{"role": "user", "content": ...}],
)

# Parse the model's text output back into Pydantic
brief = Brief.model_validate_json(response.content[0].text)

Now your code can iterate over brief.sections with full type safety. The model's output and your data model match, by construction.

Prompt chaining

For complex outputs, one big prompt usually fails. Split the work into smaller prompts that build on each other.

For a research memo, you might chain:

"Read these excerpts. Identify the five most important themes." (small output)
"For each theme, draft a 200 word section. Cite the relevant excerpts." (medium output, calls per theme)
"Read the sections you just wrote. Write a one-paragraph executive summary." (small output)

Each step is a small, focused call. The output of one becomes the input of the next. This pattern, called a chain, gives you better results than a single mega-prompt because each step is easy to evaluate and fix when it goes wrong.

A practical trick

If a model keeps misbehaving on a step, paste the exact prompt and the bad output into a fresh chat with the model. Ask it: "What is unclear about this prompt?" Models are surprisingly good at debugging their own instructions.

Test yourself

Why is constraining the output to a JSON schema more reliable than asking the model to "respond in JSON" in plain English?

21. Embeddings: turning text into numbers

An embedding is a list of numbers that represents the meaning of a piece of text. Two texts with similar meaning have embeddings that are close together in that number space. Two texts with different meanings have embeddings that are far apart.

Embeddings turn text into geometry. Similarity becomes distance you can measure.

How embeddings get made

Embeddings come from a smaller, separate model called an embedding model. You feed it text, it returns a fixed-length vector (commonly 384, 768, or 1536 numbers).

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("BAAI/bge-small-en-v1.5")

texts = [
    "The cat sat on the mat.",
    "A feline rested on the rug.",
    "Quarterly revenue grew 14%.",
]

vectors = model.encode(texts)
print(vectors.shape)   # (3, 384)

What you do with them

The simplest use is similarity search. Given a query like "what did management say about pricing?", you embed the query and find the nearest text chunks in your database.

query_vec = model.encode(["What did management say about pricing?"])[0]

# In SQL with pgvector:
#   SELECT id, text FROM document_chunks
#   ORDER BY embedding <=> :query_vec
#   LIMIT 10;

The closest results are the most relevant chunks. They might use completely different words from the query, because embeddings capture meaning, not surface text.

Why this works (a brief, optional aside)

The embedding model is trained on a giant corpus to produce vectors where two pieces of text that often appear in similar contexts (or that mean similar things in human-labelled pairs) end up close in vector space. Once the model is trained, it can do this for any text, including text it has never seen.

Think of it like this: every word and phrase ends up at a coordinate in a 384-dimensional map. Synonyms cluster together. Topics form regions. Concepts form continents.

Test yourself

If a user searches "what did the CEO say about layoffs?" and the document contains the phrase "workforce reduction", will keyword search find it? Will embedding-based search find it? Why?

22. Retrieval-Augmented Generation

Models have two limitations you cannot fix with bigger models. First, they only know what was in their training data. Second, they hallucinate when asked about things they do not know.

Retrieval-Augmented Generation (RAG) is the technique that fixes both. The idea is two steps.

Retrieve. Given the user's question, find the most relevant pieces of your data.
Generate. Send those pieces to the model along with the question, and instruct it to answer using only that material.

The classic RAG pipeline. The model never sees your whole corpus. It sees only the few chunks most relevant to each question.

The pipeline, step by step

Ingest: when a document is uploaded, parse it (PDF, DOCX, TXT), split it into chunks of a few hundred tokens each, embed each chunk, and store the chunks plus their vectors in pgvector.

async def ingest(document_id, file_bytes):
    text = parse(file_bytes)
    chunks = split_into_chunks(text, max_tokens=400, overlap=50)
    vectors = await embedder.encode([c.text for c in chunks])
    await db.insert_chunks(document_id, chunks, vectors)

Retrieve: when a user asks a question, embed the question and find the top-K most similar chunks.

async def retrieve(question, document_id, k=8):
    q_vec = await embedder.encode([question])
    return await db.search_chunks(document_id, q_vec, top_k=k)

Generate: assemble a prompt that includes the chunks, then call the LLM.

async def answer(question, document_id):
    chunks = await retrieve(question, document_id)
    context = "\n\n".join(
        f"[{i+1}] {c.text}" for i, c in enumerate(chunks)
    )
    response = await llm.complete(
        system="Answer using only the excerpts. Cite with [N].",
        user=f"Excerpts:\n{context}\n\nQuestion: {question}",
    )
    return response, chunks

Citations are the difference between a toy and a tool

Returning an answer is easy. Returning an answer plus the exact passages it came from is what turns a chat box into a research tool. Users can verify the answer. They can click through to the source. They cannot get tricked by a confident hallucination because every claim is backed by a quoted passage.

The trick is consistent labelling. Number each chunk before you send it to the model. Tell the model to cite chunks by number. Render the numbers as superscript footnotes that link back to the chunks. The model only has to learn one rule: "use [N] markers."

Why this is the most important pattern in this note

Almost every "AI feature" in modern apps is some variant of RAG. Search, customer support, document Q&A, code assistants. The plumbing changes; the shape stays. If you internalise this pipeline, you can build most of what people are calling "AI" in 2026.

Test yourself

What goes wrong if your chunks are too small? What goes wrong if they are too large?

23. MCP: letting AI use your tools

RAG sends data to the model. Sometimes you want the reverse: the model needs to do things in your system, like create a note, generate a brief, or look up a company. Tool use is the mechanism for that.

The pattern is simple. You describe a list of available functions to the model: their names, their parameters, and what they do. When the model decides it needs one, it returns a structured "I want to call this function with these arguments" message. Your code runs the function and feeds the result back. The model continues with the result in hand.

tools = [
    {
        "name": "create_note",
        "description": "Add a free-form note to a company in the user's workspace.",
        "input_schema": {
            "type": "object",
            "properties": {
                "company_id": {"type": "string"},
                "text": {"type": "string"},
            },
            "required": ["company_id", "text"],
        },
    }
]

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    tools=tools,
    messages=[{"role": "user", "content": "Add a note to Pet Supplies Plus saying I called Q1 a beat."}],
)

# response includes a tool_use block with the function name and arguments

MCP: a standard for tool sharing

Tool definitions like the one above used to live inside each application. The Model Context Protocol (MCP) standardises this. An MCP server exposes a set of tools over a small protocol, and any MCP-aware client (like Claude Desktop) can connect, discover the tools, and call them.

The result: a backend can serve its tools to a chat UI it does not own. A user running Claude Desktop can drive your application from outside its web interface, with no extra integration work on either side.

from mcp.server.fastmcp import FastMCP

mcp = FastMCP("my-app")

@mcp.tool()
async def create_note(company_id: str, text: str) -> dict:
    """Add a free-form note to a company in the user's workspace."""
    return await notes.create(company_id, text)

mcp.run()

The decorator is doing all the work. The function's docstring becomes the tool description; its type annotations become the input schema. Same domain logic, two surfaces (HTTP and MCP), zero duplication.

Test yourself

What is the difference between RAG and tool use? When would you reach for one rather than the other?

Part VII. Modern frontend

From hand-written HTML to a real frontend framework

Once an app gets past a single page, hand-writing HTML and CSS becomes painful. Modern teams use frameworks built on React that handle routing, server-side rendering, asset bundling, and a hundred other moving parts. The most popular one in 2026 is Next.js.

24. Next.js and the App Router

Next.js is a framework built on top of React. It adds three big things React does not give you on its own.

File-based routing. The folder structure inside an app/ directory becomes the URL structure of your site.
Server rendering. Pages can be rendered on the server before being sent to the browser, which makes the first load fast and the page indexable by search engines.
An asset pipeline. CSS, images, fonts, TypeScript, and JavaScript are all bundled, optimised, and cached automatically.

Routing by folders

app/
  page.tsx              // renders at /
  about/
    page.tsx            // renders at /about
  workspace/
    page.tsx            // renders at /workspace
    c/
      [id]/
        page.tsx        // renders at /workspace/c/abc123
  api/
    waitlist/
      route.ts          // HTTP endpoint at /api/waitlist

Brackets in folder names mean "dynamic": [id] matches any value and passes it to the page as a prop. So /workspace/c/abc123 matches app/workspace/c/[id]/page.tsx with id="abc123".

A page is just a component

// app/workspace/c/[id]/page.tsx

export default function CompanyPage({ params }: { params: { id: string } }) {
  return (
    <main>
      <h1>Company {params.id}</h1>
    </main>
  );
}

Default-export a React component. Next.js wires up the rest.

Layouts

Pages can share structure through layout.tsx files. A layout wraps every page in its folder.

// app/workspace/layout.tsx

export default function WorkspaceLayout({ children }: { children: React.ReactNode }) {
  return (
    <div>
      <Sidebar />
      <main>{children}</main>
    </div>
  );
}

Layouts compose. app/layout.tsx wraps everything; app/workspace/layout.tsx wraps every page under /workspace; nested layouts are nested in the rendered tree.

Test yourself

If you want a sign-in page that does not show the sidebar but everything else under `/workspace` does, where do you put the layout?

25. Server vs client components

This is the most confusing idea in modern Next.js, and the most important. There are two kinds of components, and you have to know which one you are writing.

Server components run on the server. They can call databases directly, read files, use server-side environment variables. They render HTML and ship it to the browser. They cannot use useState, useEffect, or click handlers.

Client components run in the browser. They can use state, effects, click handlers, browser APIs. They cannot call your database directly. They have to talk to the backend over HTTP.

Next.js renders server components first, then "hydrates" client components in the browser to make them interactive.

// Server component (the default)
export default async function CompanyPage({ params }) {
  const company = await db.getCompany(params.id);   // runs on server
  return <h1>{company.name}</h1>;
}

// Client component (must declare itself)
"use client";

import { useState } from "react";

export default function Counter() {
  const [n, setN] = useState(0);
  return <button onClick={() => setN(n + 1)}>{n}</button>;
}

The first line of a client component is the literal string "use client". That is the marker that switches its execution location.

The mental rule

Server components handle data. Client components handle interaction. If you can render something without needing the user to click or type, make it a server component. Reach for client components only when you need state or events.

A real page mixes both. Server components do the heavy data work; small client islands handle interaction.

Test yourself

Why can a server component await a database call directly while a client component cannot?

26. Tailwind and design tokens

We met Tailwind briefly in Module 3. Now you have enough context to see why it matters in real apps.

The core idea: a fixed set of single-purpose utility classes (bg-white, text-sm, flex, gap-4) you compose to style anything. No naming, no cascade fights, no abandoned CSS files.

<button class="rounded-md bg-ink px-4 py-2 text-sm font-medium text-paper hover:bg-ink/90">
  Save
</button>

Read the classes left to right. Rounded. Dark background. Horizontal and vertical padding. Small text. Medium font weight. Cream-coloured text. Slightly transparent on hover.

Design tokens

Notice the colours: bg-ink and text-paper. Those are not built-in Tailwind classes. They are design tokens defined in a single config file.

// tailwind.config.ts

export default {
  theme: {
    extend: {
      colors: {
        ink: "#1A1410",
        paper: "#F8F3E7",
        accent: "#8C5E22",
        line: "#D9CFB7",
      },
    },
  },
};

Now bg-ink is available everywhere. Change the hex value in this one file and the whole app re-themes. This is the key insight of modern design systems: visual identity should live in tokens, not in scattered CSS rules.

The component library trade-off

Some teams use a component library (Material UI, Chakra) that ships pre-built Buttons, Modals, and Forms. Others write everything from primitives with Tailwind. The first is faster to start, slower to customise. The second is slower to start, faster to evolve.

Most teams in 2026 land on a middle path: Tailwind plus a small in-house library of components like <Button> and <Modal>, built on top of unstyled primitives like Radix or shadcn/ui.

Test yourself

Why is changing a design system colour easier with tokens than with raw hex values scattered through the codebase?

27. Talking to your backend safely

The frontend and backend are separate programs that communicate over HTTP. Two questions need a clear answer.

How does a frontend call know what the backend expects, and how do typing errors get caught?
How does the frontend prove who the user is on every call?

One typed API client

The right pattern is a single api.ts file in your frontend that wraps every backend endpoint as a typed function.

// lib/api.ts

interface Company {
  id: string;
  ticker: string;
  name: string;
}

async function authedFetch(path: string, init: RequestInit = {}) {
  const token = await getAuthToken();
  const headers = new Headers(init.headers);
  headers.set("Authorization", `Bearer ${token}`);
  return fetch(`${API_URL}${path}`, { ...init, headers });
}

export const api = {
  getCompany: (id: string): Promise<Company> =>
    authedFetch(`/companies/${id}`).then((r) => r.json()),

  createCompany: (body: { ticker: string; name: string }): Promise<Company> =>
    authedFetch("/companies", {
      method: "POST",
      body: JSON.stringify(body),
    }).then((r) => r.json()),
};

Every component that needs to talk to the backend imports from this file. There are no inline fetch calls anywhere else. This gives you one place to add auth headers, retries, error handling, and request logging.

Sharing types with the backend

The Company interface above duplicates what the backend's Pydantic schema already says. There are two ways to keep them in sync.

Hand-write both, keep them parallel. Simple. Drifts over time.
Generate the TypeScript types from FastAPI's OpenAPI document. Tools like openapi-typescript read the auto-generated /openapi.json from your FastAPI server and produce a TypeScript file. The types are always correct because they come from the backend.

For small teams, hand-writing is fine and easier to review. For larger ones, generation pays off quickly.

The auth piece

Every request needs an Authorization: Bearer <jwt> header. The frontend's auth library (typically Supabase JS) keeps the JWT in the browser and exposes a function to fetch it. The authedFetch wrapper above grabs it on every call.

If the token is expired, the library refreshes it automatically. If the user is signed out, the function returns null and the request goes through without auth, hitting a 401 from the server.

Test yourself

Why is a single `api.ts` file better than letting components call `fetch` directly wherever they need data?

Part VIII. Shipping

From a working app to a real URL

An app on your laptop is not an app. The work to get from "runs locally" to "runs at a domain that anyone in the world can visit" is a real chunk of engineering. This part covers the moving parts.

28. Containers and Docker

An application has dependencies: a Python version, system libraries, a model file, an environment. Getting all of that to be the same on your laptop, in a teammate's laptop, on a CI runner, and on the production server is famously painful.

A container is a packaging format that wraps your application and its dependencies into a single bundle that runs identically anywhere a container runtime exists. Docker is the most common tool for building and running containers.

A Dockerfile

A Dockerfile is a recipe for building a container image: a file that says, "start from this base, copy these files in, install these packages, run this command on startup."

FROM python:3.12-slim

WORKDIR /app

# install dependencies first so the docker layer cache works
COPY pyproject.toml uv.lock ./
RUN uv sync --frozen --no-dev

# then copy the application code
COPY app ./app

EXPOSE 8000

CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

Run docker build -t my-app . and you get an image. Run docker run -p 8000:8000 my-app and the application starts in a fresh container, listening on port 8000.

Why this matters in production

Hosting providers like Fly.io, AWS ECS, Google Cloud Run, and Kubernetes all run containers as their unit of deployment. Once your app builds into an image, deploying to any of them is mostly the same set of moves. The image is portable in a way that a "Python program with these dependencies" is not.

Multi-stage builds

Production images should be small. A common trick: do the heavy install work in a "builder" image, then copy only the artefact you need into a slim "runner" image.

FROM python:3.12 AS builder
# build steps here, install everything

FROM python:3.12-slim AS runner
COPY --from=builder /app/.venv /app/.venv
COPY app ./app
CMD ["uvicorn", ...]

Final image: a few hundred megabytes instead of several gigabytes. Faster to push, faster to pull, smaller attack surface.

Test yourself

What problem does a container solve that a `requirements.txt` file does not?

29. Hosting on Fly and Vercel

You have a container for the backend and a Next.js app for the frontend. They go to different places.

Fly.io for the backend

Fly.io runs your Docker container on small virtual machines distributed around the world. You configure it with a fly.toml file at the root of your repo and deploy with a single command.

# fly.toml
app = "my-api"
primary_region = "iad"

[build]
  dockerfile = "Dockerfile"

[http_service]
  internal_port = 8000
  force_https = true
  min_machines_running = 1

[[vm]]
  size = "shared-cpu-2x"
  memory = "2gb"

Run fly deploy. Fly builds your image (or uses a cached one), uploads it to its registry, starts a machine, and routes traffic to it. Total time from push to live URL: a few minutes.

Vercel for the frontend

Vercel is the company that makes Next.js. They host Next.js apps natively. Push to a GitHub repository and Vercel builds and deploys it automatically. Each pull request gets its own preview URL.

You do not write a Dockerfile for the frontend. Vercel knows how to build a Next.js project. You give it environment variables (the API URL, the Supabase keys) through their dashboard and the build picks them up.

Why split the deployment

The backend has heavy dependencies (Python, ML models, system libraries) and lives in long-running processes that need stable IP addresses and persistent disk. Fly is designed for that.

The frontend is mostly static files plus a handful of small server functions. Vercel is designed for that, with a global CDN that caches the static parts at edge locations near your users.

Putting both on the same machine is possible but wastes a lot. Splitting them is the modern default.

Test yourself

If both your frontend and your backend run on different domains, what new concern shows up that did not exist when they were on the same domain?

30. DNS and TLS certificates

To make app.example.com point at your Vercel deployment, you need two things: a DNS record and a TLS certificate.

DNS in one paragraph

The internet runs on IP addresses. Humans use names. DNS (Domain Name System) is the lookup table that maps names to addresses. When you type a URL, your browser asks DNS for the IP address, then connects to that address.

You configure DNS with your domain registrar (GoDaddy, Cloudflare, Namecheap). Two record types matter most.

A record: "this name maps to this IPv4 address."
CNAME record: "this name is an alias for that other name."

app.example.com   A      216.198.79.1
api.example.com   CNAME  my-api.fly.dev

The first says "send traffic for app.example.com to this Vercel IP." The second says "send traffic for api.example.com wherever my-api.fly.dev currently points."

TLS certificates

Modern browsers refuse to talk to a site that does not use HTTPS. HTTPS requires a TLS certificate: a small signed file that proves the server actually owns the domain it claims to own.

Until 2015 you had to buy these. Today they are free, automatic, and managed by the hosting provider. Vercel and Fly both issue and renew certificates from Let's Encrypt automatically once you point DNS at them. The user does nothing.

The whole flow, end to end

You buy example.com from a registrar.
You add a DNS record pointing app.example.com at Vercel.
You add the domain in your Vercel project's settings.
Vercel asks Let's Encrypt for a certificate, proves it owns the domain (because the DNS now points at it), and installs the cert.
The browser visits https://app.example.com. It looks up DNS, gets Vercel's IP, opens a TLS connection, sees a valid certificate, decrypts the response.

Once you have done this two or three times, it stops feeling like a chore and starts feeling like infrastructure.

Test yourself

If you change a DNS record from one IP to another, why might users still see the old site for several hours?

31. Logs and observability

The hardest moment in running a production system is the first time it breaks. You cannot SSH into a thousand containers. You need the system to tell you, after the fact, what it was doing.

Structured logs

Every line your code writes should be a structured event, not a plain English sentence.

# bad: hard to search
print(f"Created brief for company {company_id} in {duration} ms")

# good: machine-readable
log.info("brief.created", company_id=company_id, duration_ms=duration)

The output of the second form is JSON, indexable by your log system. You can search for "every time brief.created took more than 1000 ms" trivially.

Three signals every system needs

Logs. What happened, in order. The story.
Metrics. How often, how fast, how much. The aggregates.
Traces. One request, every step it touched, with timings. The forensic detail.

For most apps, logs alone get you 80% of the way. Add metrics when you need to alert (CPU, error rate, queue depth). Add traces when you have a microservice architecture and need to follow a request across services.

LLM observability

AI features add a wrinkle: you also want to record every prompt sent to the model, every response, the tokens used, and the latency. Tools like Langfuse or Helicone are observability layers built specifically for LLM calls. They give you a dashboard of every conversation, what the model said, what it cost, and how long it took.

from langfuse import Langfuse

lf = Langfuse()

with lf.trace(name="draft_brief") as trace:
    response = client.messages.create(...)
    trace.span(name="llm.call", metadata={
        "model": response.model,
        "input_tokens": response.usage.input_tokens,
        "output_tokens": response.usage.output_tokens,
    })

Without this, debugging a misbehaving prompt means asking users to send screenshots. With it, every call is captured in one place.

Test yourself

Why are JSON-structured logs more useful than English log lines for any system bigger than a single process?

Part IX. Reading a real app

Putting the pieces back together

You now have a vocabulary for every layer in a modern AI-powered application. The last three modules zoom out, draw the whole architecture in one diagram, follow a single request from click to answer, and rehearse explaining the system to a senior engineer.

32. Architecture in one diagram

Almost every web application built in 2026 has the same shape. The technologies vary; the boxes are the same.

The standard shape. Once you see it, every modern web app looks like a variation on this picture.

Five things to notice about this architecture.

One database. Postgres holds structured data, vectors, full-text search, and auth. You do not need three different stores.
Two hosts. Frontend on a CDN-fronted serverless platform, backend in long-running containers. Each host is good at one thing.
External services are explicit. The LLM, the data providers, the email service. They live outside your trust boundary; the backend mediates every call.
The browser cannot talk to most things. It only knows the frontend host. The frontend or the backend talks to the database. The backend talks to the LLM. Layered.
Background work is not on the diagram. A scheduler inside the backend (or a separate worker process) handles slow tasks. From the outside it looks the same.

Test yourself

Draw this diagram from memory. Where do JWTs flow? Where do RLS checks happen? Where do embeddings get computed?

33. A single request, traced

Every concept in this note finally connects when you trace one request through the whole system. Imagine a user clicks the "Generate brief" button on a company page. Here is what happens.

The browser fires a click event. A React click handler calls api.generateBrief(companyId).
The function builds a POST /briefs/generate request with the company id in the body and a JWT in the Authorization header.
The browser sends the request over HTTPS to api.example.com. DNS resolves to the backend host. TLS encrypts the body.
The backend receives the request. FastAPI matches the path to a handler. A dependency runs first: it reads the JWT, fetches the public key from the auth provider, verifies the signature, and decodes the user id.
The handler runs. It validates the body with Pydantic. It opens an async database session. It checks (in code, not in policy) that the user is a member of the workspace owning the company.
The handler reads relevant document chunks from the database. It embeds the question, runs a pgvector similarity search, picks the top eight chunks.
The handler builds a prompt: a system message with the role and constraints, a user message with the chunks and the request. It calls the Anthropic API.
Anthropic streams back the brief. The handler parses the JSON output into a Pydantic Brief. It writes the brief and its sections to Postgres in a transaction.
It returns the brief object to the browser as JSON. FastAPI serialises it via Pydantic.
The frontend receives the response. The React component's state updates. Next renders the new brief into the DOM, where the user sees it appear.

Three observations about this trace.

Most of the steps are not about AI. The AI is one step out of ten. Auth, validation, database, serialisation, rendering. That is what software engineering looks like.
The trust boundary is the JWT verification step. Before that line, the request is untrusted user input. After it, the handler knows who is calling.
The LLM call is treated like any other slow network call. It is awaited, it can fail, it has a timeout, its tokens are tracked. Nothing about it is mystical.

If you can do this, you can do most of the job

If you can describe the path of one request through this stack, in your own words, you have the mental model that 90% of web engineering rests on. Every framework you learn after this is a different way of expressing the same steps.

Test yourself

Pick a different feature, like uploading a document, and trace the full path from click to "ready" through every layer.

34. Explaining it to a senior engineer

Senior engineers do not want a tour of every file. They want to see your mental model. Three minutes of clear explanation beats thirty minutes of code walkthrough.

Here is a script you can practise. The bracketed parts are what you swap in for your specific app.

The 90-second pitch

"It is a [research workspace, customer support tool, internal admin] for [audience]. The frontend is Next.js with TypeScript on Vercel. The backend is FastAPI with async SQLAlchemy on Fly.io. The database is Postgres with the pgvector extension, fronted by Supabase for auth. The LLM calls go to Anthropic. Documents are parsed, chunked, and embedded on upload, then stored in pgvector. The two interesting features are [feature one] and [feature two], both of which are RAG with citations. There is a small in-process scheduler that does background work like [thesis re-scoring, narrative-diff detection]."

The three architectural decisions you should be able to defend

If you give the pitch above, the senior engineer will pick three things and ask, "why did you do that?" Have answers ready.

Why Postgres for vectors instead of a dedicated vector database? Because keeping the data in one store eliminates a distributed system. Joins between vectors and rows are trivial. RLS works the same way for both. Operationally simpler.
Why server-side rendering on the frontend? Because the first page load is faster and search engines can index it. Client-side rendering reduces server work, but for a dashboard-style app the trade-off favours SSR.
Why an LLM at all, and where does it actually add value? The LLM is good at synthesising unstructured text into structured output. It is bad at arithmetic and at facts not in its context. So you use it for drafting and summarisation, with strict citations to the source documents, and you avoid asking it for anything that should come from a deterministic calculation.

Words to avoid

"AI" is too vague. Say what specifically: an LLM, an embedding model, a classifier. "Magic" is what people say when they have not understood the system. "Just works" is a smell. Real systems do not just work; they have explicit failure modes you have planned for.

What to say when you do not know

"I do not know that yet. My best guess is X, because Y. I can verify by Z." A senior engineer trusts that answer more than confident bluffing.

The final test

Open the architecture diagram from Module 32. Without referring to anything else, talk through the diagram for two minutes: what each box does, what flows between them, what could go wrong, and one thing you would design differently if you started over. If you can do this fluently, you are ready.

Where to go from here

This note covered the breadth. Depth comes from building. Pick one of these paths.

Build a clone. Pick a small product you use and rebuild it with this stack. Make every decision yourself. The pain points are where the learning lives.
Read a real codebase. Open a serious open-source project (Supabase, FastAPI, Next.js itself) and read the source until you can explain a non-trivial subsystem.
Specialise. Pick the part of the stack you find most interesting. Backend distributed systems. AI evaluation and RAG quality. Frontend animation and performance. Each is a career.

The hardest part of learning this stack is believing that the pieces are knowable. They are. The same five or six ideas keep coming back: state on one side, requests on the other, schemas on the boundary, async in the middle, and a few external services that the rest of your code mediates.

You can read any of it. The rest is reps.

Final prompt

In one tweet-length paragraph: what is this app, what stack does it use, what is the most interesting decision you made, and what would you do differently next time?

From Python to Production.

Welcome

How to use this note

The shape of a web application

1. The browser, the server, and what travels between them

What an HTTP request looks like

The four ideas hidden in this picture

Explain why a server needs to validate input even if the frontend already validated it.

2. HTML, the document tree

Tags and attributes

What HTML is not

What is the DOM, and how is it different from the HTML file the server sent?

3. CSS, painting the tree

The two layout models you need to know

The cascade and specificity

You will not write CSS like this in production

Given two CSS rules that both target the same element, how does the browser decide which one wins?

The frontend trinity

4. JavaScript essentials for a Python person

Variables

Functions

Objects and arrays

Async

The DOM API

What does await do, in one sentence, and why is it needed in JavaScript at all?

5. Why TypeScript exists

Why this matters more than you think

Practical types you will see all the time

If TypeScript types are erased before the code runs, why are they not just comments?

6. React, a way of describing UI

A first component

State with hooks

Effects: when you need to do something at a specific time

Why is it useful that React abstracts away DOM mutations? Name a class of bugs this prevents.

The server, from zero

7. HTTP, REST, and APIs

REST: a convention for naming endpoints

Status codes

JSON: how the body is shaped

What status code should you return when a user is logged in but tries to access a workspace they are not a member of?

8. FastAPI, a server in 30 lines

How FastAPI maps a request to a function

The killer feature: automatic docs

Dependency injection

If FastAPI types parameters automatically, what should happen if a client sends GET /items/abc when the handler expects an integer?

9. Pydantic, schemas as truth

Pydantic with FastAPI

Why this is a big deal

What error do you expect FastAPI to return if a client posts a body where ticker is an integer instead of a string?

10. Async Python: what await actually does

Two rules to keep this simple

The big mistake

What happens if an async FastAPI route calls a synchronous database driver that blocks for two seconds while a hundred users are hitting the server?

11. Background tasks and schedulers

Two patterns for background work

Schedulers: work that runs on a clock

You need to send a daily summary email to every active user. Should you use background tasks, a job queue, or a scheduler? Why?

Where the truth lives

12. SQL, the language of structured data

The two ideas that make SQL powerful

Indexes

What is the difference between a primary key and a foreign key, and what happens if you delete a row that another row's foreign key points to?

13. SQLAlchemy and the async ORM

The session

Async ORM

Why does using an ORM eliminate the most common kind of SQL injection bug?

14. Migrations: schema as code

Why this matters

Why is keeping schema changes in code more important than keeping application code in code?

15. Postgres beyond rows

JSONB: structured but flexible data

Full-text search

pgvector: similarity search for embeddings

Name three reasons to keep your data in one Postgres database rather than splitting structured data, search, and vectors across three specialised systems.

Knowing who is asking

16. How web auth actually works

The classic flow with passwords

Why hashing matters

Why is the token stored in the browser, not in a session table on the server, in modern apps?

17. JWTs, sessions, and Supabase

What does `await` do, in one sentence, and why is it needed in JavaScript at all?

If FastAPI types parameters automatically, what should happen if a client sends `GET /items/abc` when the handler expects an integer?

What error do you expect FastAPI to return if a client posts a body where `ticker` is an integer instead of a string?

Suppose a developer accidentally writes `SELECT * FROM notes` with no `WHERE` clause and exposes it through an endpoint. With RLS enabled, what happens?

If you want a sign-in page that does not show the sidebar but everything else under `/workspace` does, where do you put the layout?

Why is a single `api.ts` file better than letting components call `fetch` directly wherever they need data?

What problem does a container solve that a `requirements.txt` file does not?