Postgresql Writer cause crashed?

Hello everyone, I’m currently using Zeek version 6.0.2 and running it with the PostgreSQL Writer plugin. However, my program frequently crashes. It crashed at 05:00 AM today, it seems like my postgresql server was dead but it still working normally. I’m not sure how to investigate this issue further or what specific approaches I can take to troubleshoot it.

> \[root@thx-metagen 2025-10-29\]# ll http\*
> -rw-r–r–. 1 root root 17504 Oct 29 00:59 http.00:00:00-01:00:00.log
> -rw-r–r–. 1 root root 15152 Oct 29 01:59 http.01:00:00-02:00:00.log
> -rw-r–r–. 1 root root  9156 Oct 29 02:53 http.02:00:00-03:00:00.log
> -rw-r–r–. 1 root root 10677 Oct 29 03:53 http.03:00:00-04:00:00.log
> -rw-r–r–. 1 root root 17776 Oct 29 04:55 http.04:00:00-05:00:00.log
> -rw-r–r–. 1 root root  2991 Oct 29 05:07 http.05:00:00-05:07:27.log
[root@thx-metagen 2025-10-29]# cat reporter.05\:00\:00-05\:07\:27.log | grep "termi"
{"_worker_name":"worker-2","ts":1761681999.731424,"level":"Reporter::INFO","message":"received termination signal","location":""}
{"_worker_name":"worker-1","ts":1761681587.821689,"level":"Reporter::INFO","message":"received termination signal","location":""}
{"_worker_name":"zeek","ts":1761682042.143874,"level":"Reporter::ERROR","message":"tcp_payload/Log::WRITER_POSTGRESQL: Command failed: server closed the connection unexpectedly\n\tThis probably means the server terminated abnormally\n\tbefore or while processing the request.\n\n","location":""}
{"_worker_name":"zeek","ts":1761682042.143874,"level":"Reporter::ERROR","message":"http_payload/Log::WRITER_POSTGRESQL: Command failed: server closed the connection unexpectedly\n\tThis probably means the server terminated abnormally\n\tbefore or while processing the request.\n\n","location":""}
{"_worker_name":"zeek","ts":1761682042.161362,"level":"Reporter::ERROR","message":"conn/Log::WRITER_POSTGRESQL: Command failed: server closed the connection unexpectedly\n\tThis probably means the server terminated abnormally\n\tbefore or while processing the request.\n\n","location":""}
{"_worker_name":"zeek","ts":1761682042.161362,"level":"Reporter::ERROR","message":"http/Log::WRITER_POSTGRESQL: Command failed: server closed the connection unexpectedly\n\tThis probably means the server terminated abnormally\n\tbefore or while processing the request.\n\n","location":""}
{"_worker_name":"proxy","ts":1761682044.093353,"level":"Reporter::INFO","message":"received termination signal","location":""}
{"_worker_name":"manager","ts":1761682046.074759,"level":"Reporter::INFO","message":"received termination signal","location":""}
{"_worker_name":"zeek","ts":1761682047.970059,"level":"Reporter::INFO","message":"received termination signal","location":""}

[root@thx-metagen 2025-10-29]# systemctl status postgresql-15.service 
● postgresql-15.service - PostgreSQL 15 database server
     Loaded: loaded (/usr/lib/systemd/system/postgresql-15.service; enabled; preset: disabled)
     Active: active (running) since Tue 2025-10-28 18:01:41 JST; 15h ago
       Docs: https://www.postgresql.org/docs/15/static/
   Main PID: 1490 (postmaster)
      Tasks: 9 (limit: 254812)
     Memory: 342.0M
        CPU: 2h 15min 59.176s
     CGroup: /system.slice/postgresql-15.service
             ├─  1490 /usr/pgsql-15/bin/postmaster -D /var/lib/pgsql/15/data/
             ├─  1542 "postgres: logger "
             ├─278713 "postgres: checkpointer "
             ├─278714 "postgres: background writer "
             ├─278733 "postgres: walwriter "
             ├─278734 "postgres: autovacuum launcher "
             ├─278735 "postgres: pg_cron launcher "
             ├─278736 "postgres: logical replication launcher "
             └─444887 "postgres: zeek_user zeek_logs 127.0.0.1(37284) idle in transaction"

Notice: journal has been rotated since unit was started, output may be incomplete.

It looks like by default, if the server restarts spuriously, the writer thread terminate (because DoWrite() returns false.

Setting continue_on_errors might fix it. You can provoke it by restarting the postgres server in a testing environment. I’m not familiar with the writer and libpq too much. If continue_on_errors set to "T” doesn’t do it, the quoted code above might need more logic for recovery/reconnect

1 Like