Top 25 SQL Interview Questions (With Answers)
The 25 questions that actually come up — from DELETE vs TRUNCATE to window functions, views, and self-joins — with verified, dialect-accurate answers and the follow-ups interviewers ask next.
These 25 questions appear most often across data-analyst, backend, and data-engineering interviews. Each answer is verified and notes where databases actually differ — because the difference between "mostly right" and "exactly right" is what gets remembered. Examples target PostgreSQL unless noted. For deeper dives, see SQL JOINs Explained and SQL Window Functions. Keep the SQL cheat sheet open while you study — it covers every syntax pattern these questions reference.
1. What are the categories of SQL commands?
SQL statements group into families by what they do:
- DDL (Data Definition): CREATE, ALTER, DROP, TRUNCATE — define or change structure
- DML (Data Manipulation): INSERT, UPDATE, DELETE — change data
- DQL (Data Query): SELECT — read data (often folded into DML)
- DCL (Data Control): GRANT, REVOKE — permissions
- TCL (Transaction Control): COMMIT, ROLLBACK, SAVEPOINT — manage transactions
2. PRIMARY KEY vs UNIQUE vs FOREIGN KEY?
- PRIMARY KEY: uniquely identifies a row, is implicitly NOT NULL, and there is exactly one per table
- UNIQUE: enforces uniqueness too, but allows NULLs and you can have many per table
- FOREIGN KEY: references a key in another table and enforces referential integrity
CREATE TABLE orders (
id INT PRIMARY KEY,
email TEXT UNIQUE,
user_id INT REFERENCES users(id) -- foreign key
);
3. DELETE vs TRUNCATE vs DROP?
- DELETE — DML; removes rows (optionally with WHERE), fires row triggers, is logged per row, and rolls back inside a transaction
- TRUNCATE — DDL; removes ALL rows fast by deallocating pages, takes no WHERE, resets identity/sequence, and skips row triggers
- DROP — DDL; removes the entire table, its data, indexes, and structure
4. UNION vs UNION ALL?
Both stack two result sets that have the same number of columns and compatible types. UNION removes duplicate rows (which requires a sort/hash pass); UNION ALL keeps every row and is therefore faster.
SELECT name FROM customers
UNION ALL -- keep duplicates, no dedup cost
SELECT name FROM leads;
5. Explain the four JOIN types.
- INNER JOIN — only rows that match in both tables
- LEFT JOIN — all left rows; right columns are NULL where there is no match
- RIGHT JOIN — all right rows; left columns are NULL where there is no match
- FULL OUTER JOIN — all rows from both sides, matched where possible
SELECT u.name, o.total
FROM users u
LEFT JOIN orders o ON o.user_id = u.id;
-- users with no orders still appear, with o.total = NULL
6. WHERE vs HAVING?
WHERE filters individual rows before aggregation. HAVING filters groups after aggregation. Aggregate functions are not allowed in WHERE.
SELECT user_id, SUM(total) AS revenue
FROM orders
WHERE status = 'paid' -- filter rows first
GROUP BY user_id
HAVING SUM(total) > 1000; -- filter groups after
7. How does SQL handle NULL?
NULL means unknown. Any comparison with NULL yields NULL (not TRUE/FALSE), so you must use IS NULL / IS NOT NULL — never = NULL.
-- Wrong: never returns rows
SELECT * FROM users WHERE phone = NULL;
-- Correct
SELECT * FROM users WHERE phone IS NULL;
-- Replace NULL with a fallback
SELECT name, COALESCE(phone, 'N/A') AS phone FROM users;
8. What is the logical execution order of a query?
SQL is written SELECT-first but evaluated in a different order:
- FROM / JOIN — build the row source
- WHERE — filter rows
- GROUP BY — bucket rows
- HAVING — filter buckets
- SELECT — compute columns and aliases
- DISTINCT — drop duplicate rows
- ORDER BY — sort
- LIMIT / OFFSET — slice
9. What is normalization (1NF, 2NF, 3NF)?
Normalization organizes tables to reduce redundancy and avoid update anomalies.
- 1NF: atomic values, no repeating groups — each cell holds one value
- 2NF: 1NF + every non-key column depends on the WHOLE primary key (removes partial dependency on a composite key)
- 3NF: 2NF + no non-key column depends on another non-key column (removes transitive dependency)
- Denormalization: deliberately reintroducing redundancy to speed up reads
10. CTE vs subquery — and what is a correlated subquery?
A subquery is a query nested inside another. A CTE is a named result set defined with WITH, which makes multi-step logic readable and reusable. A correlated subquery references the outer query and re-runs per outer row. See Subqueries vs CTEs for the full breakdown.
-- CTE: named, readable, reusable
WITH pro_users AS (
SELECT id FROM users WHERE plan = 'pro'
)
SELECT * FROM orders WHERE user_id IN (SELECT id FROM pro_users);
-- Correlated subquery: runs once per outer row
SELECT o.id FROM orders o
WHERE o.total > (
SELECT AVG(o2.total) FROM orders o2 WHERE o2.user_id = o.user_id
);
11. EXISTS vs IN?
Both test membership. EXISTS stops at the first match and is NULL-safe. IN materializes the list; with a large or NULL-containing subquery it can be slower or behave unexpectedly.
-- EXISTS: short-circuits, NULL-safe
SELECT name FROM users u
WHERE EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.id);
12. When do you add an index, and what are index types?
- Add: foreign keys, columns frequently used in WHERE / JOIN / ORDER BY
- Skip: low-cardinality columns (booleans) and small tables
- Clustered index: defines the physical row order — at most one per table
- Non-clustered index: a separate structure of pointers — many allowed
- Cost: every index slows INSERT/UPDATE/DELETE and uses storage
13. ROW_NUMBER vs RANK vs DENSE_RANK?
SELECT score,
ROW_NUMBER() OVER (ORDER BY score DESC) AS row_num, -- 1,2,3,4 (always unique)
RANK() OVER (ORDER BY score DESC) AS rnk, -- 1,2,2,4 (gap after tie)
DENSE_RANK() OVER (ORDER BY score DESC) AS dense -- 1,2,2,3 (no gap)
FROM scores;
14. What are the ACID properties?
- Atomicity — a transaction is all-or-nothing
- Consistency — it moves the database from one valid state to another; constraints hold
- Isolation — concurrent transactions don't see each other's intermediate state
- Durability — once committed, data survives crashes (written to disk / WAL)
15. Find the Nth highest salary.
DENSE_RANK handles ties correctly and is the preferred answer; OFFSET is the quick alternative.
-- Nth highest (here N = 3), tie-safe
WITH ranked AS (
SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) AS rnk
FROM employees
)
SELECT DISTINCT salary FROM ranked WHERE rnk = 3;
-- Quick version
SELECT DISTINCT salary FROM employees
ORDER BY salary DESC LIMIT 1 OFFSET 2;
16. Find and remove duplicate rows.
-- Find
SELECT email, COUNT(*) AS cnt
FROM users GROUP BY email HAVING COUNT(*) > 1;
-- Remove, keeping the lowest id per email
DELETE FROM users
WHERE id NOT IN (SELECT MIN(id) FROM users GROUP BY email);
17. Find customers who never placed an order.
-- Anti-join with LEFT JOIN + IS NULL
SELECT u.id, u.name
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE o.id IS NULL;
-- Equivalent with NOT EXISTS (NULL-safe)
SELECT id, name FROM users u
WHERE NOT EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.id);
18. Write a running total.
SELECT order_date, total,
SUM(total) OVER (
ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_total
FROM orders;
19. Get the top 3 rows per group.
WITH ranked AS (
SELECT product_name, category, sales,
ROW_NUMBER() OVER (PARTITION BY category ORDER BY sales DESC) AS rn
FROM products
)
SELECT product_name, category, sales
FROM ranked WHERE rn <= 3;
20. Query a self-referencing table (employee → manager).
-- employees(id, name, manager_id) where manager_id -> id
SELECT e.name AS employee, m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id
ORDER BY manager, employee;
21. What is the difference between a view and a materialized view?
- View: a stored SELECT query that runs every time you query it — always shows fresh data, never stores rows
- Materialized view: a snapshot of the query result stored on disk — fast to read but must be manually or automatically refreshed
- Use a view when freshness is critical; use a materialized view when the query is expensive and acceptable to be slightly stale
-- Regular view: executes on every SELECT
CREATE VIEW active_users AS
SELECT * FROM users WHERE active = true;
-- Materialized view: stored on disk, refresh manually
CREATE MATERIALIZED VIEW monthly_revenue AS
SELECT DATE_TRUNC('month', created_at) AS month,
SUM(total) AS revenue
FROM orders GROUP BY 1;
REFRESH MATERIALIZED VIEW monthly_revenue; -- update the snapshot
22. Find the product with the highest sales in each category.
The pattern is: rank within partition, then filter to rank = 1. ROW_NUMBER handles ties by picking one arbitrarily; RANK keeps all tied winners.
WITH ranked AS (
SELECT category,
product_name,
SUM(sales) AS total_sales,
RANK() OVER (
PARTITION BY category
ORDER BY SUM(sales) DESC
) AS rnk
FROM products
GROUP BY category, product_name
)
SELECT category, product_name, total_sales
FROM ranked
WHERE rnk = 1;
23. What is a window frame — ROWS vs RANGE?
A window frame specifies which rows relative to the current row are included in the window function calculation.
- ROWS: counts physical rows — ROWS BETWEEN 2 PRECEDING AND CURRENT ROW includes exactly the 2 rows before and the current row
- RANGE: counts rows with the same ORDER BY value as the current row — can include more rows on ties
- UNBOUNDED PRECEDING: from the start of the partition
- CURRENT ROW: up to (and including) the current row
SELECT order_date, amount,
-- 3-day moving average (physical rows)
AVG(amount) OVER (
ORDER BY order_date
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS moving_avg_3day,
-- Running total (all rows from start)
SUM(amount) OVER (
ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_total
FROM orders;
24. How do you pivot rows into columns (conditional aggregation)?
SQL has no universal PIVOT syntax (SQL Server has one; standard SQL does not). The portable approach is CASE inside an aggregate.
-- Monthly order counts by status, pivoted into columns
SELECT
DATE_TRUNC('month', created_at) AS month,
COUNT(CASE WHEN status = 'paid' THEN 1 END) AS paid,
COUNT(CASE WHEN status = 'pending' THEN 1 END) AS pending,
COUNT(CASE WHEN status = 'refund' THEN 1 END) AS refunded
FROM orders
GROUP BY 1
ORDER BY 1;
25. Find all employees who earn more than their manager.
A classic self-join question that tests whether you understand how a table can reference itself.
-- employees(id, name, salary, manager_id)
-- manager_id is a foreign key back to id in the same table
SELECT e.name AS employee,
e.salary AS employee_salary,
m.name AS manager,
m.salary AS manager_salary
FROM employees e
JOIN employees m ON e.manager_id = m.id
WHERE e.salary > m.salary;