vacuum slowed by syslogd

Discussion:

(too old to reply)

Michael Adler

2004-01-14 03:10:31 UTC

On many occasions, I've noticed that some PostgreSQL activity takes far longer than it previously did and that disabling syslogd addresses the symptoms.

Most recently, it took 20-60 seconds to VACUUM a small, heavily updated table, while it used to take a fraction of a second. I found syslog entries like these:

13:19:53 --Relation sometable--
13:20:03 Removed 2 tuples in 1 pages.
13:20:23 ^ICPU 0.00s/0.00u sec elapsed 0.00 sec.
13:20:54 Pages 1: Changed 1, Empty 0; Tup 4: Vac 2, Keep 0, UnUsed 13.
13:20:54 ^ITotal CPU 0.00s/0.00u sec elapsed 60.12 sec.

It took almost-exactly 60 seconds, but virtually no CPU time was used (and no disk IO). Many similar examples have "real" times that are near perfect multiples of 10 seconds (e.g. 50.09, 40.07). This is not every single VACUUM, but it is frequent.

The problem disappears when syslogd is stopped or when PostgreSQL disables syslog usage. This is very consistent and I can reproduce the problem in some installations by toggling these factors on and off.

The other way to toggle the problem on/off is to disable syslogd's udp socket (though this is a feature we'd still like to use).

This evidence normally indicates a name resolution issue, but I'm not sure how to test for that beyond using "hostname -v -i".

This has happened at several installations. We use both 7.2.1 (Debian stable) and 7.3.2 (Debian testing).

I would guess that this is not the result of a PostgreSQL deficiency, but it is most sharply affected by it. Perhaps someone can offer insight?

Thanks,
Mike Adler

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ***@postgresql.org so that your
message can get through to the mailing list cleanly

Tom Lane

2004-01-14 04:36:19 UTC

Permalink

I looked in Red Hat's bugzilla for similar issues, and found a mention
that syslogd tries to reverse-lookup the address it gets each message
from. A delay there could act as you describe. At least in recent RH
releases, you can start syslogd with the "-x" switch to disable this
lookup --- does that exist in Debian, and if so does it help?

I would expect any comparable resolution delays on the client side to
occur only once when a backend is started, not once per message, since
we only openlog() once per backend. So a delay on the syslogd side
seems a likely theory.

Something else you could do to investigate is to test whether syslogd
can log messages from other clients during these 10-second delays.
If it's hung trying to resolve a name, presumably messages from other
clients would get delayed too.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ***@postgresql.org

Michael Adler

2004-01-14 21:46:18 UTC

Permalink

Post by Tom Lane

There is no such option within Debian or the syslogd from freshmeat. Sounds nice, though.

I was able to track down a name resolution issue. The "nameserver" in /etc/resolv.conf pointed to a nonexistent host. This added ten second delays to telnetd logins and netstat -a. This seemed to be the heart of the problem, but I'm getting some mixed info from the "field" and I'm still following up.

Thanks,
Mike

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Gaetano Mendola

2004-01-15 02:46:09 UTC

Permalink

Post by Michael Adler
On many occasions, I've noticed that some PostgreSQL activity takes far longer than it previously did and that disabling syslogd addresses the symptoms.
13:19:53 --Relation sometable--
13:20:03 Removed 2 tuples in 1 pages.
13:20:23 ^ICPU 0.00s/0.00u sec elapsed 0.00 sec.
13:20:54 Pages 1: Changed 1, Empty 0; Tup 4: Vac 2, Keep 0, UnUsed 13.
13:20:54 ^ITotal CPU 0.00s/0.00u sec elapsed 60.12 sec.
It took almost-exactly 60 seconds, but virtually no CPU time was used (and no disk IO). Many similar examples have "real" times that are near perfect multiples of 10 seconds (e.g. 50.09, 40.07). This is not every single VACUUM, but it is frequent.
The problem disappears when syslogd is stopped or when PostgreSQL disables syslog usage. This is very consistent and I can reproduce the problem in some installations by toggling these factors on and off.

Did you try to put a "-" before the file log name ?
Example:

LOCAL0.* -/var/log/postgresql.log

Regards
Gaetano Mendola

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html