Beginner

GROUP BY & Aggregate Functions: A Practical Guide

COUNT, SUM, AVG, MIN, MAX and GROUP BY are how you turn raw rows into answers. Learn the patterns that show up in every real report.

Aggregate functions collapse many rows into a single summary value. Combined with GROUP BY, they answer almost every reporting question: revenue per customer, orders per day, average rating per product, signups per country. If you can write a clean GROUP BY, you can build most of a dashboard.

What are aggregates?

The five core aggregates are COUNT, SUM, AVG, MIN, and MAX. Used alone — without GROUP BY — they reduce the entire table to a single row.

-- One summary row for the whole table
SELECT
  COUNT(*) AS total_orders,
  SUM(total) AS revenue,
  AVG(total) AS avg_order_value,
  MIN(total) AS smallest,
  MAX(total) AS largest
FROM orders;

GROUP BY basics

GROUP BY splits rows into buckets, then runs the aggregate once per bucket. The rule that trips up every beginner: every column in the SELECT must either appear in the GROUP BY or be wrapped in an aggregate. There is no "just show me one of them" — the database needs to know exactly what to do with each column.

GROUP BY category — Five input rows collapse into one row per group, with SUM(value) computed for each. This is the defining behavior of GROUP BY — many rows in, one summary row per bucket out.

-- Revenue per customer
SELECT user_id, SUM(total) AS revenue
FROM orders
GROUP BY user_id
ORDER BY revenue DESC;

The order SQL runs in

SQL does not execute in the order you write it. Understanding the logical order explains why you can use a column alias in ORDER BY but not in WHERE, and why HAVING can reference an aggregate but WHERE cannot.

logical execution order — SQL runs in this order — not the order you write it. That is why WHERE cannot see SELECT aliases (SELECT runs later) and why HAVING, not WHERE, filters aggregates.

FROM / JOIN — assemble the rows
WHERE — filter individual rows (no aggregates allowed here)
GROUP BY — bucket the rows
HAVING — filter the buckets (aggregates allowed)
SELECT — compute the output columns and aliases
ORDER BY — sort the final result (can use SELECT aliases)

Grouping by multiple columns

List several columns to create one bucket per unique combination. This is how you build cross-tabulated summaries — orders per status per day, sales per region per month.

-- Orders per status, per day
SELECT
  order_date,
  status,
  COUNT(*) AS orders
FROM orders
GROUP BY order_date, status
ORDER BY order_date, status;

Filtering groups with HAVING

WHERE filters rows before grouping. HAVING filters the groups after aggregation. If your condition uses an aggregate, it belongs in HAVING; if it filters individual rows, keep it in WHERE so the database has fewer rows to group.

-- Customers who spent more than 1000
SELECT user_id, SUM(total) AS revenue
FROM orders
WHERE status = 'paid'      -- filter rows first (cheaper)
GROUP BY user_id
HAVING SUM(total) > 1000;  -- filter groups after

COUNT DISTINCT & FILTER

COUNT(DISTINCT col) counts unique values, not rows — perfect for "how many distinct customers ordered each day". The FILTER clause lets you apply different conditions to different aggregates in a single pass, which is far cleaner than several correlated subqueries.

SELECT
  order_date,
  COUNT(*) AS all_orders,
  COUNT(DISTINCT user_id) AS unique_customers,
  COUNT(*) FILTER (WHERE status = 'paid') AS paid_orders,
  SUM(total) FILTER (WHERE status = 'paid') AS paid_revenue
FROM orders
GROUP BY order_date
ORDER BY order_date;

Subtotals with ROLLUP

ROLLUP adds subtotal and grand-total rows to a grouped result — exactly what a financial report needs. GROUP BY ROLLUP (a, b) produces totals for each (a, b), each a, and the whole set.

-- Revenue per status per day, plus daily and grand totals
SELECT order_date, status, SUM(total) AS revenue
FROM orders
GROUP BY ROLLUP (order_date, status)
ORDER BY order_date, status;

How aggregates treat NULL

SUM, AVG, MIN, MAX all ignore NULL values entirely
AVG divides by the count of non-NULL values, not the row count
COUNT(*) includes NULLs; COUNT(column) excludes them
An aggregate over zero matching rows returns NULL (except COUNT, which returns 0)
GROUP BY treats all NULLs as a single group

Common mistakes

Putting an aggregate in WHERE instead of HAVING
Forgetting a SELECT column in the GROUP BY clause
Assuming AVG counts NULL rows in its denominator
Using COUNT(column) when you meant COUNT(*)
Reaching for a column alias in WHERE before it exists

Ready to keep your rows visible while you aggregate? That's exactly what window functions do — and when a group filter gets complex, a CTE keeps it readable.