What is the difference between INNER JOIN and LEFT JOIN?

INNER JOIN returns only rows that match in both tables. LEFT JOIN returns all rows from the left table plus matching rows from the right, with NULLs where there is no match.

What is the difference between DELETE, TRUNCATE, and DROP?

DELETE is DML and removes rows (optionally with WHERE) and can be rolled back. TRUNCATE is DDL and removes all rows quickly, resetting identity. DROP removes the entire table and its structure. In MySQL/Oracle TRUNCATE auto-commits; in PostgreSQL/SQL Server it is transactional.

How does SQL handle NULL values?

NULL means unknown, and any comparison with NULL yields NULL, so you must use IS NULL or IS NOT NULL rather than = NULL. COALESCE replaces NULL with a fallback value.

What is the difference between PRIMARY KEY and UNIQUE?

A PRIMARY KEY uniquely identifies a row, is implicitly NOT NULL, and there is one per table. A UNIQUE constraint also enforces uniqueness but allows NULLs and you can have many per table.

What is the logical execution order of an SQL query?

FROM/JOIN, then WHERE, GROUP BY, HAVING, SELECT, DISTINCT, ORDER BY, and finally LIMIT. This is why a SELECT alias cannot be used in WHERE but can be used in ORDER BY.

What is the difference between RANK, DENSE_RANK, and ROW_NUMBER?

ROW_NUMBER assigns a unique sequential number. RANK leaves gaps after ties. DENSE_RANK handles ties without leaving gaps.

How do you find the Nth highest salary in SQL?

Use DENSE_RANK() OVER (ORDER BY salary DESC) in a CTE and filter where the rank equals N, which handles ties correctly. A quick alternative is ORDER BY salary DESC LIMIT 1 OFFSET N-1.

How do you find employees who earn more than their manager?

Self-join: JOIN employees e ON employees m ON e.manager_id = m.id, then WHERE e.salary > m.salary. Use a regular JOIN (not LEFT JOIN) to ensure you only compare employees who have a manager.

Career

Top 25 SQL Interview Questions (With Answers)

Q: What is the difference between UNION and UNION ALL?

UNION combines two result sets and removes duplicate rows, which requires a sort/hash pass. UNION ALL keeps all rows including duplicates and is faster.

Q: What is the difference between a view and a materialized view?

A view is a stored SELECT query that executes every time you query it — always fresh, no storage. A materialized view stores the query result on disk — faster to read but requires explicit refresh to update.

The 25 questions that actually come up — from DELETE vs TRUNCATE to window functions, views, and self-joins — with verified, dialect-accurate answers and the follow-ups interviewers ask next.

These 25 questions appear most often across data-analyst, backend, and data-engineering interviews. Each answer is verified and notes where databases actually differ — because the difference between "mostly right" and "exactly right" is what gets remembered. Examples target PostgreSQL unless noted. For deeper dives, see SQL JOINs Explained and SQL Window Functions. Keep the SQL cheat sheet open while you study — it covers every syntax pattern these questions reference.

1. What are the categories of SQL commands?

SQL statements group into families by what they do:

DDL (Data Definition): CREATE, ALTER, DROP, TRUNCATE — define or change structure
DML (Data Manipulation): INSERT, UPDATE, DELETE — change data
DQL (Data Query): SELECT — read data (often folded into DML)
DCL (Data Control): GRANT, REVOKE — permissions
TCL (Transaction Control): COMMIT, ROLLBACK, SAVEPOINT — manage transactions

2. PRIMARY KEY vs UNIQUE vs FOREIGN KEY?

PRIMARY KEY: uniquely identifies a row, is implicitly NOT NULL, and there is exactly one per table
UNIQUE: enforces uniqueness too, but allows NULLs and you can have many per table
FOREIGN KEY: references a key in another table and enforces referential integrity

CREATE TABLE orders (
  id      INT PRIMARY KEY,
  email   TEXT UNIQUE,
  user_id INT REFERENCES users(id)  -- foreign key
);

3. DELETE vs TRUNCATE vs DROP?

DELETE — DML; removes rows (optionally with WHERE), fires row triggers, is logged per row, and rolls back inside a transaction
TRUNCATE — DDL; removes ALL rows fast by deallocating pages, takes no WHERE, resets identity/sequence, and skips row triggers
DROP — DDL; removes the entire table, its data, indexes, and structure

4. UNION vs UNION ALL?

Both stack two result sets that have the same number of columns and compatible types. UNION removes duplicate rows (which requires a sort/hash pass); UNION ALL keeps every row and is therefore faster.

SELECT name FROM customers
UNION ALL          -- keep duplicates, no dedup cost
SELECT name FROM leads;

5. Explain the four JOIN types.

INNER JOIN — only rows that match in both tables
LEFT JOIN — all left rows; right columns are NULL where there is no match
RIGHT JOIN — all right rows; left columns are NULL where there is no match
FULL OUTER JOIN — all rows from both sides, matched where possible

SELECT u.name, o.total
FROM users u
LEFT JOIN orders o ON o.user_id = u.id;
-- users with no orders still appear, with o.total = NULL

6. WHERE vs HAVING?

WHERE filters individual rows before aggregation. HAVING filters groups after aggregation. Aggregate functions are not allowed in WHERE.

SELECT user_id, SUM(total) AS revenue
FROM orders
WHERE status = 'paid'        -- filter rows first
GROUP BY user_id
HAVING SUM(total) > 1000;    -- filter groups after

7. How does SQL handle NULL?

NULL means unknown. Any comparison with NULL yields NULL (not TRUE/FALSE), so you must use IS NULL / IS NOT NULL — never = NULL.

-- Wrong: never returns rows
SELECT * FROM users WHERE phone = NULL;

-- Correct
SELECT * FROM users WHERE phone IS NULL;

-- Replace NULL with a fallback
SELECT name, COALESCE(phone, 'N/A') AS phone FROM users;

8. What is the logical execution order of a query?

SQL is written SELECT-first but evaluated in a different order:

FROM / JOIN — build the row source
WHERE — filter rows
GROUP BY — bucket rows
HAVING — filter buckets
SELECT — compute columns and aliases
DISTINCT — drop duplicate rows
ORDER BY — sort
LIMIT / OFFSET — slice

9. What is normalization (1NF, 2NF, 3NF)?

Normalization organizes tables to reduce redundancy and avoid update anomalies.

1NF: atomic values, no repeating groups — each cell holds one value
2NF: 1NF + every non-key column depends on the WHOLE primary key (removes partial dependency on a composite key)
3NF: 2NF + no non-key column depends on another non-key column (removes transitive dependency)
Denormalization: deliberately reintroducing redundancy to speed up reads

10. CTE vs subquery — and what is a correlated subquery?

A subquery is a query nested inside another. A CTE is a named result set defined with WITH, which makes multi-step logic readable and reusable. A correlated subquery references the outer query and re-runs per outer row. See Subqueries vs CTEs for the full breakdown.

-- CTE: named, readable, reusable
WITH pro_users AS (
  SELECT id FROM users WHERE plan = 'pro'
)
SELECT * FROM orders WHERE user_id IN (SELECT id FROM pro_users);

-- Correlated subquery: runs once per outer row
SELECT o.id FROM orders o
WHERE o.total > (
  SELECT AVG(o2.total) FROM orders o2 WHERE o2.user_id = o.user_id
);

11. EXISTS vs IN?

Both test membership. EXISTS stops at the first match and is NULL-safe. IN materializes the list; with a large or NULL-containing subquery it can be slower or behave unexpectedly.

-- EXISTS: short-circuits, NULL-safe
SELECT name FROM users u
WHERE EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.id);

12. When do you add an index, and what are index types?

Add: foreign keys, columns frequently used in WHERE / JOIN / ORDER BY
Skip: low-cardinality columns (booleans) and small tables
Clustered index: defines the physical row order — at most one per table
Non-clustered index: a separate structure of pointers — many allowed
Cost: every index slows INSERT/UPDATE/DELETE and uses storage

13. ROW_NUMBER vs RANK vs DENSE_RANK?

SELECT score,
  ROW_NUMBER() OVER (ORDER BY score DESC) AS row_num,  -- 1,2,3,4 (always unique)
  RANK()       OVER (ORDER BY score DESC) AS rnk,   -- 1,2,2,4 (gap after tie)
  DENSE_RANK() OVER (ORDER BY score DESC) AS dense  -- 1,2,2,3 (no gap)
FROM scores;

14. What are the ACID properties?

Atomicity — a transaction is all-or-nothing
Consistency — it moves the database from one valid state to another; constraints hold
Isolation — concurrent transactions don't see each other's intermediate state
Durability — once committed, data survives crashes (written to disk / WAL)

15. Find the Nth highest salary.

DENSE_RANK handles ties correctly and is the preferred answer; OFFSET is the quick alternative.

-- Nth highest (here N = 3), tie-safe
WITH ranked AS (
  SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) AS rnk
  FROM employees
)
SELECT DISTINCT salary FROM ranked WHERE rnk = 3;

-- Quick version
SELECT DISTINCT salary FROM employees
ORDER BY salary DESC LIMIT 1 OFFSET 2;

16. Find and remove duplicate rows.

-- Find
SELECT email, COUNT(*) AS cnt
FROM users GROUP BY email HAVING COUNT(*) > 1;

-- Remove, keeping the lowest id per email
DELETE FROM users
WHERE id NOT IN (SELECT MIN(id) FROM users GROUP BY email);

17. Find customers who never placed an order.

-- Anti-join with LEFT JOIN + IS NULL
SELECT u.id, u.name
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE o.id IS NULL;

-- Equivalent with NOT EXISTS (NULL-safe)
SELECT id, name FROM users u
WHERE NOT EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.id);

18. Write a running total.

SELECT order_date, total,
  SUM(total) OVER (
    ORDER BY order_date
    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
  ) AS running_total
FROM orders;

19. Get the top 3 rows per group.

WITH ranked AS (
  SELECT product_name, category, sales,
    ROW_NUMBER() OVER (PARTITION BY category ORDER BY sales DESC) AS rn
  FROM products
)
SELECT product_name, category, sales
FROM ranked WHERE rn <= 3;

20. Query a self-referencing table (employee → manager).

-- employees(id, name, manager_id) where manager_id -> id
SELECT e.name AS employee, m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id
ORDER BY manager, employee;

21. What is the difference between a view and a materialized view?

View: a stored SELECT query that runs every time you query it — always shows fresh data, never stores rows
Materialized view: a snapshot of the query result stored on disk — fast to read but must be manually or automatically refreshed
Use a view when freshness is critical; use a materialized view when the query is expensive and acceptable to be slightly stale

-- Regular view: executes on every SELECT
CREATE VIEW active_users AS
  SELECT * FROM users WHERE active = true;

-- Materialized view: stored on disk, refresh manually
CREATE MATERIALIZED VIEW monthly_revenue AS
  SELECT DATE_TRUNC('month', created_at) AS month,
         SUM(total) AS revenue
  FROM orders GROUP BY 1;

REFRESH MATERIALIZED VIEW monthly_revenue; -- update the snapshot

22. Find the product with the highest sales in each category.

The pattern is: rank within partition, then filter to rank = 1. ROW_NUMBER handles ties by picking one arbitrarily; RANK keeps all tied winners.

WITH ranked AS (
  SELECT category,
         product_name,
         SUM(sales) AS total_sales,
         RANK() OVER (
           PARTITION BY category
           ORDER BY SUM(sales) DESC
         ) AS rnk
  FROM products
  GROUP BY category, product_name
)
SELECT category, product_name, total_sales
FROM ranked
WHERE rnk = 1;

23. What is a window frame — ROWS vs RANGE?

A window frame specifies which rows relative to the current row are included in the window function calculation.

ROWS: counts physical rows — ROWS BETWEEN 2 PRECEDING AND CURRENT ROW includes exactly the 2 rows before and the current row
RANGE: counts rows with the same ORDER BY value as the current row — can include more rows on ties
UNBOUNDED PRECEDING: from the start of the partition
CURRENT ROW: up to (and including) the current row

SELECT order_date, amount,
  -- 3-day moving average (physical rows)
  AVG(amount) OVER (
    ORDER BY order_date
    ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
  ) AS moving_avg_3day,
  -- Running total (all rows from start)
  SUM(amount) OVER (
    ORDER BY order_date
    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
  ) AS running_total
FROM orders;

24. How do you pivot rows into columns (conditional aggregation)?

SQL has no universal PIVOT syntax (SQL Server has one; standard SQL does not). The portable approach is CASE inside an aggregate.

-- Monthly order counts by status, pivoted into columns
SELECT
  DATE_TRUNC('month', created_at) AS month,
  COUNT(CASE WHEN status = 'paid'    THEN 1 END) AS paid,
  COUNT(CASE WHEN status = 'pending' THEN 1 END) AS pending,
  COUNT(CASE WHEN status = 'refund'  THEN 1 END) AS refunded
FROM orders
GROUP BY 1
ORDER BY 1;

25. Find all employees who earn more than their manager.

A classic self-join question that tests whether you understand how a table can reference itself.

-- employees(id, name, salary, manager_id)
-- manager_id is a foreign key back to id in the same table
SELECT e.name  AS employee,
       e.salary AS employee_salary,
       m.name  AS manager,
       m.salary AS manager_salary
FROM employees e
JOIN employees m ON e.manager_id = m.id
WHERE e.salary > m.salary;