A GIN (Generalized Inverted Index) on a jsonb column is the standard way to make containment and existence queries fast in PostgreSQL. It indexes the keys and values inside each document, so a query like data @> '{"status": "active"}' can find matching rows without reading every row. The trade-off is write cost and index size: GIN indexes are slower to update than B-tree indexes and can grow large, so the right strategy depends on which operators your queries actually use.
How JSONB Is Stored and Why It Affects Queries
jsonb stores a decomposed binary representation, not the original text. On write, PostgreSQL parses the JSON, drops insignificant whitespace, deduplicates keys (last value wins), and sorts keys so lookups inside a document are fast. That parsing makes jsonb writes marginally slower than json, which stores the raw text verbatim - but reads skip reparsing, which is why jsonb is the right default for anything you query. The full comparison lives in JSON vs JSONB.
Large documents are stored out of line. When a row exceeds roughly 2 KB (the TOAST threshold, TOAST_TUPLE_THRESHOLD), PostgreSQL compresses and moves the jsonb value into a TOAST table. A query that extracts one key from a large TOASTed document still has to fetch and de-TOAST the whole value, so wide JSONB blobs cost more per access than the key you wanted suggests.
Without an index, every jsonb predicate forces a sequential scan that de-TOASTs and inspects each document. On a table of any size that is the bottleneck, and it is the symptom that sends people looking for a GIN index.
GIN Indexes: jsonb_ops vs jsonb_path_ops
A default GIN index uses the jsonb_ops operator class. It creates one index entry per key and per value in the document, which makes it general: it supports the containment operator @>, the existence operators ?, ?|, and ?&, and the jsonpath operators @? and @@ (PostgreSQL 12+).
-- Default GIN index, supports @>, ?, ?|, ?&, @?, @@
CREATE INDEX idx_events_data ON events USING gin (data);
-- Containment: rows whose data contains this subset
SELECT * FROM events WHERE data @> '{"status": "active"}';
-- Key existence: rows whose top-level object has a "user_id" key
SELECT * FROM events WHERE data ? 'user_id';
The jsonb_path_ops operator class indexes only hashes of full paths to values, not individual keys. It produces a smaller index and faster @> lookups, because it does not create entries for bare keys. The cost is reduced operator coverage: jsonb_path_ops supports @>, @?, and @@, but not the key-existence operators ?, ?|, ?&. If your workload is almost entirely containment queries, it is usually the better choice.
-- Smaller, faster for @> only; does NOT support ? ?| ?&
CREATE INDEX idx_events_data_path ON events USING gin (data jsonb_path_ops);
GiST is sometimes mentioned for JSONB, but it is rarely the right tool. GIN is the standard choice for jsonb; reach for GiST only in specialized cases such as range or geometric data, not general document indexing.
| Strategy | Operators supported | Size / speed | When to use |
|---|---|---|---|
GIN jsonb_ops (default) |
@>, ?, `? |
, ?&, @?, @@` |
Larger index, broad coverage |
GIN jsonb_path_ops |
@>, @?, @@ only |
Smaller, faster @> |
Workload is mostly @> containment |
Expression B-tree on ->> |
=, <, >, range, ORDER BY |
Smallest, supports sorting | Filtering or sorting on one scalar key |
Expression Indexes on a Specific Key
When queries always filter or sort on one scalar inside the document, a full GIN index is overkill. An expression (functional) B-tree index on the extracted value is smaller, supports equality and range scans, and can satisfy ORDER BY. Use ->> to extract the value as text, and the query predicate must use the exact same expression for the planner to match the index.
-- B-tree on a single extracted scalar key
CREATE INDEX idx_events_user ON events ((data ->> 'user_id'));
-- Matches the index because the expression is identical
SELECT * FROM events WHERE data ->> 'user_id' = '12345';
Cast inside the index expression if you compare numerically, so (data ->> 'amount')::numeric sorts and ranges as a number rather than as text. You can also build a partial expression index when only a subset of rows is queried, which shrinks the index further. If the index is in place but the planner ignores it, the cause is usually an expression mismatch or a type mismatch - see why an index is not used.
You can also build an expression GIN index on a nested object path, for example CREATE INDEX ON events USING gin ((data -> 'attributes')), when containment queries always target one branch of a large document. Indexing the sub-object instead of the whole column keeps the index narrow and the scans cheaper.
The Write Cost of GIN Indexes
GIN indexes are expensive to maintain on writes because a single inserted document can touch many index entries - one per key and value. To soften this, GIN buffers new entries in a pending list and merges them in bulk, controlled by fastupdate (on by default) and gin_pending_list_limit (default 4MB). Inserts append to the pending list cheaply; the list is flushed into the main index structure during VACUUM, by autovacuum, or when it overflows the limit.
The pending list is a trade-off, not free speed. A large pending list slows down reads, because queries must scan the unmerged list in addition to the main index. Workloads that are write-heavy benefit from the default fastupdate; read-latency-sensitive workloads sometimes turn it off or lower gin_pending_list_limit so the list stays short.
-- Tune pending-list behavior per index
ALTER INDEX idx_events_data SET (fastupdate = off);
ALTER INDEX idx_events_data SET (gin_pending_list_limit = '2MB');
This write amplification is the real reason a GIN index can hurt. A common production fix is the one this article's source case ran into: a query like data -> 'attributes' @> '{"flag": true}' over a wide JSONB column scanned 10 GB because the engine still de-TOASTed full documents. Pulling the hot flags into typed columns with B-tree indexes, and moving the raw JSON to a side table read only on demand, cut both the scan size and the write cost.
Diagnosing this by hand means correlating pg_stat_statements, EXPLAIN (ANALYZE, BUFFERS) plans, and index-size growth over time. Pulse watches that loop for PostgreSQL: it flags when a jsonb predicate is driving sequential scans or when a GIN index's write overhead and bloat are outpacing its read benefit, then recommends the concrete change - switch to jsonb_path_ops, add an expression index, or extract the key into a column - for a human to approve.
Frequently Asked Questions
Q: When should I use jsonb_path_ops instead of the default jsonb_ops?
A: Use jsonb_path_ops when your queries are almost entirely containment checks with @>. It builds a smaller index and runs @> faster because it indexes hashed value paths rather than every key. Stay on the default jsonb_ops if you also need the key-existence operators ?, ?|, or ?&, which jsonb_path_ops does not support.
Q: Why is my JSONB query slow even with a GIN index?
A: Common causes are using an operator the index does not cover (a jsonb_path_ops index will not help a ? existence query), extracting with ->> against a GIN index built for containment, or de-TOASTing large documents row by row. Run EXPLAIN (ANALYZE, BUFFERS) to confirm the index is used and to see how many blocks are read.
Q: Should I index the whole JSONB column or just one key?
A: Index one key with an expression B-tree on (data ->> 'key') when queries always filter or sort on that single scalar - it is smaller and supports ranges and sorting. Use a GIN index on the whole column when queries match arbitrary keys and values with @> or the existence operators.
Q: How much do GIN indexes slow down writes in PostgreSQL?
A: GIN inserts touch one entry per key and value, so they cost more than B-tree inserts. The fastupdate mechanism (on by default) buffers new entries in a pending list sized by gin_pending_list_limit (default 4MB) and merges them in bulk, which keeps insert latency low at the cost of slightly slower reads until the list is flushed.
Q: Does a B-tree expression index work for JSONB containment queries?
A: No. A B-tree expression index on (data ->> 'key') accelerates equality, range, and sort on that one extracted scalar, but it cannot serve @> containment or ? existence queries. For those, use a GIN index on the jsonb column.
Q: What is the @@ operator and how does it relate to GIN indexes?
A: @@ evaluates a jsonpath predicate against a jsonb value and returns the result (PostgreSQL 12+), and @? checks whether a jsonpath matches at all. Both are supported by jsonb_ops and jsonb_path_ops GIN indexes, so a GIN index can accelerate jsonpath queries the same way it accelerates @>.
Q: Why does extracting one key from a large JSONB document feel slow?
A: When a jsonb value exceeds the TOAST threshold (about 2 KB), PostgreSQL stores it compressed in a TOAST table. Reading any single key requires fetching and decompressing the entire value, so wide documents cost more per access than the size of the key you read. Splitting hot fields into their own columns avoids the de-TOAST cost.
Related Reading
- JSON vs JSONB in PostgreSQL: Storage differences and which type to choose.
- The JSONB Contains Operator (@>): Syntax and indexing for containment queries.
- The jsonb_each Function: Expanding a JSONB object into key/value rows.
- Creating Indexes in PostgreSQL: Syntax and options for B-tree, GIN, and expression indexes.
- How PostgreSQL Indexes Work: Index types and the access paths the planner can use.
- Why a PostgreSQL Index Is Not Used: Diagnosing expression and type mismatches that defeat an index.