PostgreSQL Data Types: A Practical Guide for Engineers

Choosing the right PostgreSQL data type matters more than most engineers expect. A poorly chosen type doesn't just waste storage - it can silently corrupt calculations, make queries slower, complicate migrations, and push business logic into application code that the database could enforce on its own. PostgreSQL offers a richer type system than most relational databases, which gives you real modeling power, but also more ways to get things subtly wrong.

Numeric Types

PostgreSQL gives you four practical numeric families: exact integers, arbitrary-precision decimals, and two floating-point types.

integer (4 bytes, range -2,147,483,648 to 2,147,483,647) is the right default for most whole-number columns - IDs, counters, ages, quantities. When that range isn't enough, bigint (8 bytes) covers -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807. Don't default to bigint for everything just to be safe - it does carry some extra overhead in index and join operations, and on very write-heavy tables the difference is measurable. smallint (2 bytes) is worth using for truly bounded values like status codes or small enumerations where you want to document that constraint in the schema.

numeric (also called decimal) stores exact arbitrary-precision values. For money, never use floating-point. Use numeric(precision, scale) - for example, numeric(19, 4) for most financial applications. The trade-off is performance: arithmetic on numeric goes through software-based decimal logic and is substantially slower than integer arithmetic. If you're doing millions of aggregations per second, this adds up. real (4 bytes, ~6 decimal digits of precision) and double precision (8 bytes, ~15 decimal digits) are IEEE 754 floating-point types. They're fast and fine for scientific measurements, physics simulations, or statistics - anywhere approximation is acceptable. Store currency in them and you'll eventually see a row where $10.10 becomes $10.099999999999999 in a report.

Character Types

PostgreSQL has three string types: char(n), varchar(n), and text. The short answer is: use text.

The longer answer: all three share the same underlying storage mechanism. There is no performance difference between text and varchar - the PostgreSQL docs are explicit on this. varchar(n) adds a length check on every write, which is marginal overhead, but not a meaningful performance concern. The real reason to choose text is that length limits on string columns are almost never a schema-level concern. If you're limiting a username to 50 characters, that's an application rule that can change, and increasing the limit of a varchar(50) column (e.g., to varchar(200)) is a catalog-only change since PostgreSQL 9.1 and requires no rewrite, as is converting it to text, but reducing the limit or changing to an incompatible type does require a rewrite. A CHECK constraint on a text column is both more explicit and easier to change.

char(n) pads short strings with spaces to fill the declared width. It does not save storage for short values - in fact it wastes it. There's no meaningful use case for char(n) in modern PostgreSQL applications. Most of its legacy comes from databases like Oracle and SQL Server where the type had different semantics. For fixed-format codes that must always be exactly N characters (like ISO country codes), char(n) is sometimes used by convention, but a CHECK (length(col) = N) on a text column is clearer about intent.

Date and Time Types

PostgreSQL's temporal types include date, time, timestamp, timestamptz, and interval. The most consequential decision here is between timestamp and timestamptz.

timestamp stores a date and time with no timezone context. It records exactly what you put in - like a picture of a calendar and clock. timestamptz (timestamp with time zone) internally stores UTC and converts on output to the session's TimeZone setting. Critically, both types use 8 bytes - there is no storage advantage to timestamp. The trap with timestamp is that when your application servers span multiple timezones, or when you change a server's timezone, the stored values suddenly represent different moments in time. For any column that records when something happened, use timestamptz. The PostgreSQL wiki's "Don't Do This" page explicitly advises against timestamp for most use cases.

date is appropriate when the time component is irrelevant - a birth date, a subscription expiration date, a calendar event that has no time component. interval is the right type for durations: '3 days'::interval or '1 hour 30 minutes'::interval. Don't store durations as integers of seconds and reconstruct them in application code. One additional note: avoid timetz (time with timezone) - the PostgreSQL documentation acknowledges it exists only for SQL standard compliance and is not practically useful.

Boolean, UUID, and Arrays

boolean in PostgreSQL is a proper type, not a tinyint. It accepts true, false, 't', 'f', 'yes', 'no', 'on', 'off', '1', and '0' as input. Use it instead of integer flags - the query planner handles it correctly and it's self-documenting.

uuid stores a 128-bit Universally Unique Identifier as 16 bytes, more compact than the 36-byte text representation. Since PostgreSQL 13, gen_random_uuid() is built-in and generates UUIDv4 (random) values. The known performance problem with UUIDv4 as a primary key is index fragmentation: random values cause B-tree page splits and poor locality. PostgreSQL 18 adds a native uuidv7() function that generates time-ordered UUIDs, which have sequential-ish ordering and substantially better index locality for insert-heavy tables.

PostgreSQL arrays let a single column hold a variable-length ordered list of any base type: integer[], text[], uuid[]. They're useful when the values are genuinely atomic and you'd never need to join on or filter by individual elements at scale. For example, storing a list of tag strings on a document works well as text[] with a GIN index. Where arrays break down is when you start needing foreign key references, per-element attributes, or complex filtering - at that point, a proper join table is almost always cleaner.

JSONB, hstore, Range Types, and Composite Types

jsonb stores binary-encoded JSON with GIN index support, key-based access, and a full set of operators for containment and path queries. It's the right choice when data has variable schema - feature flags per customer, product attributes that vary by category, audit records. The trade-off is that jsonb columns resist normalization: you can't enforce referential integrity on a value inside a JSON document, and queries that need to extract and filter on deeply nested fields can be slow without careful index design.

json (without the 'b') stores the raw text of the JSON string and re-parses it on every access. There's almost no reason to use json over jsonb unless you need to preserve exact whitespace and key ordering. hstore predates jsonb and supports only flat string key-value pairs. It's still in the ecosystem but jsonb is strictly more capable for new work.

Range types are one of PostgreSQL's most underused features. int4range, int8range, numrange, tsrange, tstzrange, daterange represent continuous intervals with configurable inclusivity on each bound. They come with built-in operators for overlap (&&), containment (@>), and adjacency, and GiST indexes support these operators efficiently. A hotel booking table using daterange instead of separate check_in and check_out columns can enforce no-overlap constraints with a single exclusion constraint:

-- Combining an integer equality condition with a range overlap in a single
-- GiST exclusion constraint requires the btree_gist extension:
CREATE EXTENSION btree_gist;

ALTER TABLE bookings
  ADD CONSTRAINT no_overlapping_bookings
  EXCLUDE USING gist (room_id WITH =, stay_period WITH &&);

Implementing the same logic with two date columns requires either application-level validation or a trigger - neither of which gets you the same correctness guarantees at the storage layer.

Composite types let you define a named tuple type and use it as a column type. They're useful for structured sub-objects that appear across multiple tables - think address with street, city, country, and postal code fields. Composite columns can be indexed, passed to functions, and queried with dot notation. They don't give you joins or foreign keys against the sub-fields, so they're best suited to embedded value objects rather than entities.

Type Selection in Practice

The general principle: pick the most specific type that correctly models your data. Use timestamptz over timestamp, numeric over floating-point for money, text over varchar(n) for strings, and boolean over integers for flags. Reach for jsonb when schema flexibility is genuinely needed, not as an escape hatch from normalization. Use range types when your domain has intervals with overlap semantics - they'll save you from writing and maintaining constraint logic elsewhere.

Schema changes in PostgreSQL can be expensive. Changing varchar(100) to text is cheap (it's a catalog change), but changing integer to bigint on a large table rewrites every row. Migrating timestamp to timestamptz on a production table with millions of rows requires careful data conversion to avoid silently shifting stored times. Getting types right at design time is cheaper than fixing them later.