PostgreSQL I/O Error (SQLSTATE 58030)

PostgreSQL raises SQLSTATE 58030 (io_error) when it encounters a failure performing a low-level I/O operation — most commonly a read or write to a data file, WAL segment, or temporary file. In logs and on the client you will typically see messages such as:

ERROR:  could not write to file "pg_wal/000000010000000000000001": No space left on device
ERROR:  could not read block 42 in file "base/16384/1259": read only 0 of 8192 bytes
ERROR:  could not fsync file "pg_wal/000000010000000000000001": Input/output error
PANIC:  could not write to file "pg_wal/000000010000000000000001": No space left on device

The condition name is io_error and it belongs to SQLSTATE class 58System Error (errors external to PostgreSQL itself).

What This Error Means

SQLSTATE class 58 covers errors that originate outside of PostgreSQL's own logic — the database engine asked the operating system to perform an I/O operation and the OS returned an error. Unlike class 22 (data exceptions) or class 42 (syntax/permission errors), a class 58 error does not indicate a problem with a query or schema; it indicates the underlying storage subsystem has failed or is unable to fulfil the request.

When PostgreSQL writes a dirty page to a heap or index file, flushes a WAL segment to disk, or reads a page during a query, it calls OS-level read(), write(), and fsync() system calls. If any of these return an error, PostgreSQL maps the failure to 58030. Depending on the call that failed, the server will log the specific file path, byte offset, and the OS errno string (e.g., No space left on device, Input/output error, EIO).

For write failures on WAL segments, PostgreSQL may escalate to PANIC and shut down to prevent data inconsistency — it cannot safely continue if it cannot guarantee that committed transactions have been written to durable storage. For read failures on data pages, the error is returned to the affected query and the transaction is aborted, but the postmaster typically remains running unless the failure affects a critical system catalog or a background worker.

Common Causes

  1. Disk full or filesystem quota exhausted. The most common cause. PostgreSQL's WAL, temporary files, or heap files cannot be extended because the filesystem has no free space. Check df -h on the PostgreSQL data directory immediately.

  2. Hardware I/O error. A failing disk, degraded RAID array, or faulty storage controller returns EIO to the OS. The kernel typically logs these in dmesg or /var/log/syslog before PostgreSQL sees them.

  3. Network filesystem (NFS/EFS/CIFS) interruption. Storing PGDATA on a network-attached filesystem that experiences a network partition or timeout causes I/O calls to return errors. PostgreSQL is not designed for unreliable networked storage.

  4. Filesystem errors or corruption. An unclean shutdown, kernel bug, or power failure can leave the filesystem in a state where specific inodes or blocks are unreadable. Running fsck on an unmounted filesystem may reveal these.

  5. OS-level file descriptor or resource limit. If the process hits ulimit -n (open file descriptors) or ulimit -f (file size), writes can fail with EFBIG or EMFILE, which PostgreSQL surfaces as an I/O error.

  6. Permissions change or file removed while running. If a WAL segment or data file is deleted or made unwritable while PostgreSQL has it open, subsequent writes to that file descriptor will fail.

  7. Misconfigured or failing storage in cloud environments. Throttled EBS volumes (AWS), underprovisioned Azure Disks, or GCP Persistent Disks that exceed IOPS/throughput limits can return I/O errors or stall indefinitely.

How to Fix io_error

  1. Check available disk space first.

    df -h /var/lib/postgresql
    du -sh /var/lib/postgresql/*/pg_wal
    

    If the disk is full, free space immediately. Common culprits are an oversized pg_wal directory (replication slot lag), large temporary files, or unrotated log files.

  2. Inspect OS-level I/O errors in system logs.

    dmesg | grep -i "i/o error\|ata\|scsi\|blk_update"
    journalctl -k | grep -i "error"
    

    Hardware-level errors appearing here confirm a failing disk or controller and require hardware replacement or migration before restarting PostgreSQL.

  3. Check PostgreSQL logs for the exact file and block number. PostgreSQL always logs the file path and OS error string. The file path maps directly to a relation:

    -- Convert a relfilenode path like base/16384/12345 to a table name
    SELECT relname, relkind
    FROM pg_class
    WHERE relfilenode = 12345;
    
  4. Increase file descriptor limits if relevant.

    # Check current limits for the postgres process
    cat /proc/$(pgrep -x postgres | head -1)/limits | grep "open files"
    
    # In /etc/security/limits.conf or systemd service unit:
    # postgres  soft  nofile  65536
    # postgres  hard  nofile  65536
    
  5. Resolve replication slot lag to shrink pg_wal.

    -- Identify lagging or stale replication slots consuming WAL
    SELECT slot_name, active, pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS lag
    FROM pg_replication_slots
    ORDER BY lag DESC;
    
    -- Drop a stale slot if safe to do so
    SELECT pg_drop_replication_slot('slot_name');
    
  6. After resolving a PANIC shutdown, verify data integrity. If PostgreSQL shut down with PANIC due to a WAL write failure, it will attempt crash recovery on restart. Review the startup logs carefully and run VACUUM and ANALYZE on affected tables after the server comes back online.

  7. For cloud storage, check for throttling metrics. In AWS, inspect BurstBalance and VolumeQueueDepth for EBS volumes. Consider provisioning higher IOPS or migrating to io2 volume types for production workloads.

Additional Information

  • SQLSTATE class 58 (System Error) contains three conditions: 58000 (system_error), 58030 (io_error), and 58P01 (undefined_file) / 58P02 (duplicate_file). The 58030 code is specifically for OS-reported I/O failures.
  • A write failure to a WAL file results in PANIC and server shutdown rather than a clean error return — PostgreSQL cannot continue operating if it cannot durably record committed transactions.
  • The zero_damaged_pages GUC parameter (default off) controls whether PostgreSQL raises an error or silently zeroes out a damaged page when it encounters a read error. Enabling it with SET zero_damaged_pages = on can allow a query to proceed past a corrupt page, but doing so in production risks silently losing data and should only be used as a last resort during disaster recovery.
  • Client libraries and ORMs surface this as a standard database error with SQLSTATE 58030. In psycopg2/psycopg3 it appears as psycopg2.errors.IoError; in JDBC it appears as a PSQLException with getSQLState() returning "58030".
  • In PostgreSQL 14 and later, recovery_init_sync_method affects how the server syncs files during crash recovery, which can influence whether I/O errors during recovery are treated as fatal.

Frequently Asked Questions

Why does PostgreSQL shut down completely (PANIC) instead of just failing the query?

WAL write failures are treated as unrecoverable because PostgreSQL cannot guarantee ACID durability if it cannot write the WAL. If a COMMIT appeared successful to a client but the WAL record was never written to disk, a subsequent crash would silently lose that committed transaction. To prevent this inconsistency, PostgreSQL shuts down immediately on WAL I/O failure.

Can I recover data after a 58030 error caused by a failed disk?

If the disk failure affected data files but the WAL is intact on a separate device, you may be able to restore from a base backup and replay WAL. If both data and WAL are on the same failing device, your options depend on what backups you have. Always use pg_basebackup or a snapshot-based backup strategy with WAL archiving to a separate location.

The error only happens under heavy write load. Is that an I/O throughput problem?

Yes. Under heavy load, PostgreSQL issues more fsync() calls and may hit storage throughput or IOPS limits, causing the OS to return errors or stall. Monitor pg_stat_bgwriter for buffers_checkpoint, checkpoint_write_time, and checkpoint_sync_time to understand checkpoint pressure, and consider tuning checkpoint_completion_target, max_wal_size, or upgrading storage.

How do I distinguish a one-time transient I/O error from a failing disk?

A single occurrence that does not repeat — especially after a brief filesystem hiccup or NFS reconnect — may be transient. Repeated occurrences on the same file path or block number, combined with kernel-level error messages in dmesg, strongly indicate hardware failure. Run S.M.A.R.T. diagnostics (smartctl -a /dev/sdX) and check RAID controller event logs if applicable.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.