PostgreSQL JSON and JSONB: Types, Operators, and When to Use Them

PostgreSQL has supported JSON storage since version 9.2, and JSONB since 9.4. Despite the similar names, they are fundamentally different types with different storage layouts, operator sets, and performance characteristics. Picking the wrong one - or using either when a normalized schema would be better - is a common source of production performance issues that only show up at scale.

JSON vs JSONB: How Storage Differs

The json type stores an exact copy of the input text. Every time you read a json column, PostgreSQL parses the text from scratch. The upside is fidelity: key order is preserved, duplicate keys are retained, and whitespace is stored as-is. The downside is that parsing overhead hits every query that touches the column.

jsonb stores data in a decomposed binary format. On write, PostgreSQL parses the JSON, discards insignificant whitespace, deduplicates keys (last value wins), and sorts keys. This makes writes marginally slower, but reads skip reparsing entirely. Operators and functions work directly on the binary representation.

There is one behavioral difference worth knowing: if your application depends on key insertion order or deliberately writes duplicate keys in a JSON object, json preserves both. In practice, very few legitimate use cases rely on this. For almost every production workload, jsonb is the right default.

Operators You Will Actually Use

PostgreSQL ships a rich set of operators for both types. The most commonly used ones are:

-> extracts a field, returning the same type as the input column (json or jsonb). data -> 'address' returns the nested object as json or jsonb depending on the column type.
->> extracts a field as text. data ->> 'email' returns a plain string, useful for comparisons and output.
#> navigates a path given as an array of keys: data #> '{address,city}' drills into nested objects.
#>> does the same but returns text.
@> tests containment. data @> '{"status": "active"}'::jsonb returns true if the left side contains the right side as a subset.
? tests for key existence. data ? 'phone' is true if the top-level object has a key named phone.
?| checks if any key in an array exists. data ?| array['phone', 'mobile'] returns true if either key is present.
?& requires all keys to be present.
|| concatenates two jsonb values, merging documents: data || '{"new_key": "val"}'.
- deletes a key or element: data - 'key' removes a top-level key.
#- deletes at a path: data #- '{address,zip}' removes a nested field.

For updates, jsonb_set(target, path, new_value) replaces or inserts a value at a given path and returns a new complete document:

UPDATE users
SET profile = jsonb_set(profile, '{address,zip}', '"94107"')
WHERE profile ? 'address';

PostgreSQL 12 also introduced native jsonpath support, which provides a path expression language modeled after SQL/JSON. You can use it with the @? operator (path existence) or @@ (path match returning boolean), enabling queries like:

SELECT * FROM orders WHERE data @? '$.items[*] ? (@.price > 100)';

The jsonb_set function rewrites the entire value on every call. For large documents, this matters.

Indexing JSONB with GIN

Regular B-tree indexes do not work on jsonb columns directly - you can only B-tree index an extracted scalar value with a functional index. For querying inside documents, GIN (Generalized Inverted Index) is what PostgreSQL provides.

Two operator classes exist:

jsonb_ops (the default) creates index entries for every key and every value in the document. It supports ?, ?|, ?&, @>, @?, and @@. The index is comprehensive but large - it can reach 60-80% of the table's data size for wide documents with many distinct keys.

jsonb_path_ops indexes only values, hashed together with the path of keys leading to them. It supports @>, @?, and @@, but not key-existence operators. The index is substantially smaller and typically faster for containment queries, especially when keys are high-cardinality or queries target deeply nested paths.

-- Default operator class
CREATE INDEX idx_users_profile ON users USING GIN (profile);

-- Smaller, faster for containment queries
CREATE INDEX idx_users_profile_path ON users USING GIN (profile jsonb_path_ops);

The choice of operator class must match the operators in your queries. If your queries use ? for key existence but your index was built with jsonb_path_ops, PostgreSQL will not use the index - it will fall back to a sequential scan silently. Always EXPLAIN ANALYZE your queries after adding a GIN index to verify it is being used.

For queries that always filter on a specific extracted key, a functional B-tree index is often smaller and faster than GIN:

CREATE INDEX idx_users_status ON users ((profile ->> 'status'));

JSONB vs a Normalized Schema

JSONB handles genuinely variable or sparse data well. If a table models entities with a different set of optional attributes per row - say, product metadata where a laptop has specs a book does not - putting those attributes in a jsonb column avoids dozens of nullable columns. It also lets you add new attributes without schema migrations, which matters when the attribute set is controlled by external systems or user configuration.

The costs are concrete. PostgreSQL does not maintain per-value statistics inside jsonb columns the way it does for scalar columns. Before PostgreSQL 13, the query planner fell back unconditionally to a fixed selectivity estimate of around 0.1% for JSONB predicates. PostgreSQL 13 improved this: when sufficient data exists, the planner can use MCV (most common values) and histogram data for better selectivity estimates on containment operators. PostgreSQL 14 also added support for extended statistics on expressions, so CREATE STATISTICS on a JSONB expression becomes possible. When statistics are absent or insufficient, the planner still falls back to the fixed estimate, which can cause catastrophically bad plan choices when JSONB predicates are combined with other filters. A normalized column with a standard index gives the planner real statistics to work with.

Storage is another real cost. PostgreSQL stores each key string in every row with no dictionary compression at the schema level. A jsonb column holding objects with ten fixed keys will repeat those key strings in every row. Measured against a normalized equivalent, JSONB tables commonly use 2x the disk space. Heap bloat follows write amplification: jsonb_set always rewrites the entire value, not just the changed field. For documents that regularly get partial updates and exceed 2KB, PostgreSQL moves them to TOAST storage. Every subsequent update means full decompression, modification, recompression, and a new row version in the WAL.

The practical heuristic: if you know your schema and it is stable, use columns. If the shape of data varies meaningfully per row and you do not need to run analytical aggregations across those fields, jsonb is a reasonable fit. Avoid storing large documents (multi-kilobyte) that get frequent partial updates in jsonb columns. Avoid using jsonb as a workaround for schema design work that simply has not been done yet - that is a debt that tends to compound.

A hybrid approach often works well: core queryable fields as regular columns with proper indexes, and a jsonb column for the variable remainder. This keeps planner statistics healthy for the fields that drive most queries, and confines JSONB to where it genuinely earns its place.