PostgreSQL CREATE DATABASE: Syntax, Options, and Practical Patterns

PostgreSQL's CREATE DATABASE command is deceptively simple on the surface. A bare CREATE DATABASE mydb; works fine in development, but production deployments involve encoding choices, locale settings, template selection, and ownership that become load-bearing decisions. Getting them wrong - especially encoding and collation - can corrupt sort order in indexes or cause restore failures.

This article covers the full syntax, each option's purpose and constraints, the createdb command-line wrapper, and the architectural question of multiple databases versus schemas.

The Full Syntax and Options

The complete form of the command as of PostgreSQL 16+ is:

CREATE DATABASE name
    [ WITH ] [ OWNER [=] user_name ]
           [ TEMPLATE [=] template ]
           [ ENCODING [=] encoding ]
           [ LOCALE [=] locale ]
           [ LC_COLLATE [=] lc_collate ]
           [ LC_CTYPE [=] lc_ctype ]
           [ LOCALE_PROVIDER [=] locale_provider ]
           [ ICU_LOCALE [=] icu_locale ]
           [ ICU_RULES [=] icu_rules ]
           [ TABLESPACE [=] tablespace_name ]
           [ ALLOW_CONNECTIONS [=] allowconn ]
           [ CONNECTION LIMIT [=] connlimit ]
           [ IS_TEMPLATE [=] istemplate ];

Note: the full syntax also includes a STRATEGY parameter, added in PostgreSQL 15. It accepts WAL_LOG (the default since PG 15, which creates the database by writing WAL records page by page) or FILE_COPY (the legacy method). Using WAL_LOG is why CREATE DATABASE no longer forces a checkpoint in PostgreSQL 15 and later.

OWNER sets which role owns the database. To create a database owned by another role, you must be able to SET ROLE to that role — superusers can always do this, but so can non-superusers who are members of the target role with the SET option. If omitted, ownership defaults to the executing role. This matters for permission management - the owner can drop the database and all its objects, and can grant or revoke connection privileges.

ENCODING sets the character set for the new database. UTF8 is the right default for almost every use case today. SQL_ASCII is a common trap: it disables multibyte encoding enforcement and accepts any byte sequence, which means garbage-in, garbage-out for non-ASCII data. It also makes cross-database dumps and restores harder. Set UTF8 unless you have a specific legacy reason not to.

LC_COLLATE and LC_CTYPE control string sorting order and character classification respectively. LC_COLLATE affects ORDER BY on text columns and index sort order. LC_CTYPE affects character type classifications used by functions like upper(), lower(), and regex matching. The constraint is that these must be compatible with the encoding - for most non-C locales there is exactly one compatible encoding. The C (or POSIX) locale accepts any encoding and gives byte-order string sorting, which is faster but not useful for natural language sorting.

LOCALE is a shorthand that sets both LC_COLLATE and LC_CTYPE in a single parameter. LOCALE_PROVIDER (added in PostgreSQL 15) selects the collation library: libc (the traditional OS-level locale support) and icu (the International Components for Unicode library, which provides more predictable and portable collation) were both available from PostgreSQL 15; builtin was added in PostgreSQL 17. ICU collation is worth considering for new databases where consistent sort behavior across OS upgrades matters - libc collation can shift between OS versions, which invalidates existing indexes silently.

CONNECTION LIMIT caps concurrent connections to the database. The value -1 means unlimited. Be aware that this limit is enforced approximately under high concurrency due to race conditions, and it is not enforced against superusers or background workers. It is a soft guard, not a hard quota.

IS_TEMPLATE marks the database as a template that any role with CREATEDB privilege can clone. By default only the owner or a superuser can use a given database as a template.

template0 vs template1

Every CREATE DATABASE operation works by cloning an existing database. The default source is template1. During PostgreSQL cluster initialization, initdb creates two template databases: template1 and template0.

template1 is mutable. Any objects, extensions, or schema changes you add to it will appear in every database subsequently created from it. This is useful for site-wide defaults - for example, installing pg_stat_statements or uuid-ossp in template1 makes them available in all new databases without explicit CREATE EXTENSION calls. The flip side: if template1 accumulates unwanted objects, every new database inherits the mess.

template0 is preserved in its original state from cluster initialization and should never be modified. Its defining practical property is that you can specify different ENCODING and locale settings when cloning it, whereas cloning template1 requires matching its encoding and locale exactly. This is because template1 may contain encoding-specific or locale-specific data - copying it with mismatched settings would corrupt indexes and sort order.

The two canonical use cases for template0 are restoring a pg_dump archive (use createdb -T template0 then pg_restore) and creating a database with a non-default encoding or locale:

-- Create a database with Swedish locale, must use template0
CREATE DATABASE swedish_data
    LOCALE 'sv_SE.utf8'
    TEMPLATE template0;

-- Pristine database for restoring a dump
CREATE DATABASE restored_app
    ENCODING 'UTF8'
    TEMPLATE template0;

One more note: CREATE DATABASE cannot run inside a transaction block. It also fails if any other session is connected to the template database at the time of cloning. For template0 this is rarely an issue since connections to it are blocked by default (datallowconn = false), but it can surface when using a custom template.

The createdb Command-Line Tool

createdb is a thin shell around CREATE DATABASE that connects to the cluster and issues the SQL. It is convenient for scripting and automation without needing a SQL client:

# Basic creation
createdb myapp

# With owner, encoding, and template
createdb -O appuser -E UTF8 -T template0 myapp

# Using a specific host and port
createdb -h db.internal -p 5432 -U postgres myapp

The flags map directly to CREATE DATABASE options: -O for owner, -E for encoding, -T for template, -l for locale. All options available in the SQL command have a CLI equivalent. One difference: createdb connects to the postgres maintenance database by default (configurable via --maintenance-db=dbname). PGDATABASE, if set, controls the name of the database to create, not the maintenance connection target. The postgres database must exist - on fresh clusters it does, created by initdb alongside the templates.

Connecting to a New Database and First Steps

After creation, connect with psql:

psql -d myapp -U appuser

Or switch databases within an existing psql session:

\c myapp

The newly created database has default privileges: the owner has full access, and the PUBLIC pseudo-role has CONNECT privilege and TEMPORARY privilege by default. In many production setups you want to revoke the default CONNECT from PUBLIC and grant it explicitly:

REVOKE CONNECT ON DATABASE myapp FROM PUBLIC;
GRANT CONNECT ON DATABASE myapp TO appuser;

Note that database-level GRANT permissions are not copied from the template - every new database starts with PostgreSQL's default privilege set regardless of what the template had.

Multiple Databases vs Schemas

PostgreSQL enforces a hard boundary at the database level: a single connection can access exactly one database. Cross-database queries require dblink or postgres_fdw, both of which add latency and operational surface area. This distinction drives most of the architectural trade-off between multiple databases and multiple schemas.

Multiple schemas within one database share a connection pool, allow JOIN across schema boundaries, and simplify backup and restore - pg_dump operates per-database. They are the right model for multi-tenant applications where tenants need logical separation but the workloads are similar and the team wants shared connection pooling through tools like PgBouncer.

Multiple databases within one cluster make sense when workloads are genuinely unrelated - different applications with different owners, different encoding requirements, or different lifecycle (independent schema migrations, separate restore points). Importantly, all databases in a cluster share the same WAL stream, which means a pg_basebackup captures all databases together. If you need database-level point-in-time recovery isolation, you need separate clusters, not separate databases.

For most SaaS applications, schemas per tenant within one database - combined with search_path management and row-level security - scales further than people expect and avoids the cross-database query problem entirely. Add databases when you have genuine isolation requirements at the application, compliance, or operational level, not just as an organizational convention.