PostgreSQL Grouping Error (SQLSTATE 42803)

When a query violates PostgreSQL's grouping rules, you will see an error like:

ERROR:  column "orders.customer_name" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT customer_name, SUM(amount) FROM orders;
               ^
SQLSTATE: 42803

The condition name is grouping_error, SQLSTATE 42803, and it means a column referenced in SELECT, ORDER BY, or HAVING is neither aggregated nor listed in the GROUP BY clause.

What This Error Means

SQLSTATE 42803 belongs to PostgreSQL error class 42 — "Syntax Error or Access Rule Violation". Despite the class name, this is a semantic error caught during query analysis, not a runtime error. PostgreSQL raises it during the planning phase before any data is touched, so no rows are read and no transaction state is affected. You can retry the corrected query immediately.

The rule PostgreSQL enforces is a core property of the relational model: when you use GROUP BY, the query collapses multiple rows into a single output row per group. Every column in SELECT, ORDER BY, and HAVING must be either (a) listed in GROUP BY — meaning it has the same value for all rows in the group — or (b) wrapped in an aggregate function (SUM, COUNT, MAX, MIN, AVG, etc.) that reduces the group to a single value. Any column that is neither creates an ambiguity: which row's value should PostgreSQL use? PostgreSQL refuses to guess, unlike some other databases.

PostgreSQL is stricter here than MySQL's default mode (prior to ONLY_FULL_GROUP_BY being enabled). If you are migrating queries from MySQL or SQLite, you will frequently encounter 42803 for queries those databases accepted silently — sometimes returning arbitrary values from the group.

Common Causes

  1. Missing column in GROUP BY — The most frequent cause: a SELECT list includes a plain column alongside an aggregate, but that column is not listed in GROUP BY.

    -- Fails: customer_name is not grouped
    SELECT customer_name, SUM(amount)
    FROM orders
    GROUP BY customer_id;
    
  2. Selecting a non-key column from a joined table — Joining tables and selecting columns from the joined table without including them in GROUP BY.

    -- Fails: p.product_name is not in GROUP BY
    SELECT p.product_name, COUNT(o.id)
    FROM orders o
    JOIN products p ON o.product_id = p.id
    GROUP BY o.product_id;
    
  3. ORDER BY on a non-aggregated, non-grouped columnORDER BY has the same grouping requirement as SELECT.

    -- Fails: created_at is not grouped or aggregated
    SELECT customer_id, COUNT(*)
    FROM orders
    GROUP BY customer_id
    ORDER BY created_at;
    
  4. HAVING clause referencing a bare column — Using a raw column (not an aggregate) in HAVING that is not in GROUP BY.

    -- Fails: status is not in GROUP BY
    SELECT customer_id, COUNT(*)
    FROM orders
    GROUP BY customer_id
    HAVING status = 'active';
    
  5. Subquery or CTE with implicit grouping — A correlated subquery that performs aggregation but accidentally references an outer column in a way PostgreSQL cannot resolve.

How to Fix grouping_error

  1. Add the missing column to GROUP BY — The simplest fix when you genuinely want that column as a grouping dimension.

    -- Fixed: add customer_name to GROUP BY
    SELECT customer_name, SUM(amount)
    FROM orders
    GROUP BY customer_id, customer_name;
    
  2. Use an aggregate function — If you only need one representative value from the group and all rows have the same value, or you need max/min/first, wrap the column in an aggregate.

    -- Use MAX when all rows in the group share the same value
    SELECT customer_id, MAX(customer_name), SUM(amount)
    FROM orders
    GROUP BY customer_id;
    
  3. Use DISTINCT ON instead of GROUP BY — When the goal is deduplication rather than aggregation, DISTINCT ON can be clearer.

    SELECT DISTINCT ON (customer_id)
        customer_id, customer_name, amount
    FROM orders
    ORDER BY customer_id, created_at DESC;
    
  4. Use a subquery or CTE to separate aggregation from selection — Aggregate first, then join back for the non-aggregated columns.

    WITH totals AS (
        SELECT customer_id, SUM(amount) AS total_amount
        FROM orders
        GROUP BY customer_id
    )
    SELECT c.customer_name, t.total_amount
    FROM totals t
    JOIN customers c ON t.customer_id = c.id;
    
  5. Fix ORDER BY to use only grouped or aggregated columns — Replace the bare column with an aggregate or add it to GROUP BY.

    -- Fixed: order by the aggregate
    SELECT customer_id, COUNT(*), MAX(created_at) AS latest_order
    FROM orders
    GROUP BY customer_id
    ORDER BY latest_order DESC;
    

Additional Information

  • SQLSTATE class 42 errors in PostgreSQL include related grouping and syntax violations: 42P20 (windowing_error, raised when a window function is used in a context it is not allowed), and 42803 is specific to GROUP BY violations.
  • PostgreSQL has enforced this standard since version 7.4. There is no compatibility mode to relax it — unlike MySQL where ONLY_FULL_GROUP_BY can be disabled.
  • ORMs such as ActiveRecord (Rails), SQLAlchemy, and Django ORM generally construct valid GROUP BY clauses automatically when using their aggregation APIs. You are most likely to see 42803 when writing raw SQL via .execute(), connection.raw(), or similar escape hatches.
  • The error is raised at parse/analysis time, before execution. It does not consume any query resources and does not affect transaction state.
  • PostgreSQL's FILTER clause on aggregates (COUNT(*) FILTER (WHERE status = 'active')) is a clean alternative to HAVING when filtering within aggregation, and can help avoid the need to add columns to GROUP BY just to satisfy HAVING.

Frequently Asked Questions

Why does this query work in MySQL but fail in PostgreSQL? MySQL historically defaulted to allowing non-deterministic grouping, picking an arbitrary row's value for ungrouped columns. PostgreSQL (and the SQL standard) require every selected column to be deterministically derivable from the group. MySQL added the ONLY_FULL_GROUP_BY SQL mode in version 5.7.5, which enforces the same rule — but it may not be enabled on older installations.

Can I suppress this error if I know all rows in the group have the same value? Not directly — PostgreSQL has no hint mechanism for this. The idiomatic solution is to wrap the column in MAX() or MIN(), which is semantically correct when all values in the group are identical. Some teams also use ANY_VALUE() available as an extension; in standard PostgreSQL, MAX() is the conventional idiom.

Does this error affect my transaction? No. SQLSTATE 42803 is raised during query analysis, before execution begins. Your transaction remains open and unaffected. You can correct the query and retry within the same transaction.

I'm using GROUP BY with a primary key — why am I still getting 42803? If you group by a table's primary key, PostgreSQL does not automatically infer that all other columns of that table are functionally determined. You must either list them in GROUP BY or aggregate them. The SQL standard's concept of "functional dependency" (where grouping by a primary key implies all columns of that table are available) is supported in PostgreSQL since version 9.1 — but only when the primary key constraint is present and the query planner can detect the dependency. If the error persists, verify the constraint exists and is not deferrable.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.