Database Normalization: 1NF, 2NF & 3NF Made Simple
Normalization is just organising tables so each fact lives in exactly one place. Learn 1NF, 2NF, and 3NF with a single example refactored step by step — and when to denormalize.
Normalization has a reputation for being academic, but the idea is simple and practical: organise your tables so that every fact is stored in exactly one place. Do that and a whole category of bugs — contradictory data, lost rows, impossible updates — simply cannot happen. Keep the SQL cheat sheet open as you read.
What Normalization Is
Normalization is the process of splitting data into related tables to eliminate redundancy. Instead of repeating a customer’s address on every one of their orders, you store the customer once and link orders to them with a foreign key. The "normal forms" (1NF, 2NF, 3NF) are progressively stricter rules for how well you have done that.
Splitting out redundancy — A flat table repeats the same customer on every row. Normalizing stores each customer once and links the rest back by id.
The Problem: Anomalies
Picture a single flat orders table that repeats the customer name, email, and address on every row. That redundancy creates three classic anomalies — and they are the reason normalization exists.
- Update anomaly — a customer changes email, and you must update every one of their order rows or the data contradicts itself
- Insertion anomaly — you cannot add a customer until they place an order, because their data only lives in the orders table
- Deletion anomaly — deleting a customer’s last order also erases the only record of that customer
First Normal Form (1NF)
A table is in 1NF when every column holds a single, atomic value and each row is unique. No comma-separated lists in one cell, no repeating groups of columns like phone1, phone2, phone3.
-- NOT 1NF: multiple phones jammed into one column
-- users(id, name, phones) -> '555-1111, 555-2222'
-- 1NF: one phone per row in a related table
CREATE TABLE user_phones (
user_id bigint REFERENCES users(id),
phone text NOT NULL,
PRIMARY KEY (user_id, phone)
);
Second Normal Form (2NF)
2NF applies when a table has a composite primary key. It requires that every non-key column depend on the whole key, not just part of it. If a column depends on only one piece of the composite key, it belongs in a separate table.
-- NOT 2NF: product_name depends only on product_id,
-- not on the full (order_id, product_id) key
-- order_items(order_id, product_id, product_name, quantity)
-- 2NF: product_name moves to the products table
CREATE TABLE products (
id bigint PRIMARY KEY,
name text NOT NULL
);
CREATE TABLE order_items (
order_id bigint REFERENCES orders(id),
product_id bigint REFERENCES products(id),
quantity int NOT NULL,
PRIMARY KEY (order_id, product_id)
);
Third Normal Form (3NF)
3NF removes transitive dependencies: a non-key column should not depend on another non-key column. If you store a zip code and also the city and state that the zip determines, the city/state depend on the zip, not on the row’s key — so they belong elsewhere.
-- NOT 3NF: city and state depend on zip, not on user id
-- users(id, name, zip, city, state)
-- 3NF: factor the zip -> city/state fact into its own table
CREATE TABLE zip_codes (
zip text PRIMARY KEY,
city text NOT NULL,
state text NOT NULL
);
-- users(id, name, zip REFERENCES zip_codes(zip))
When to Denormalize
Normalization optimizes for correct writes; it can cost you read performance, because answering a question means joining many tables. Denormalization deliberately reintroduces some redundancy — a cached count, a duplicated label — to make heavy read paths faster. It is a conscious trade-off, not a license to be sloppy.
Common Mistakes & Tips
- Storing lists in a single column (comma-separated) — a 1NF violation that makes querying painful
- Repeating descriptive data (names, labels) across many rows instead of linking to a lookup table
- Denormalizing early "for speed" before measuring whether reads are actually slow
- Denormalizing without a way to keep the copies consistent
- Treating the normal forms as dogma — 3NF is the practical target, not a religion
Good schema design is invisible when it works and painful when it does not. Practice querying normalized datasets to feel why the structure pays off.