Discussion:
postmaster stopps
(too old to reply)
Ashley Maher
2004-10-12 03:29:37 UTC
Permalink
G'day,

I installed postgresql-7.4.3 about three months ago.

Twice, with no known reason the postmaster has stopped.

I searched the lists for this but obviousely I'm not searching for the
right thing. (couple thousand replies, little relevance)

Hints, Ideas appreciated.

Regards,

Ashley

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org
Tom Lane
2004-10-12 03:54:54 UTC
Permalink
Post by Ashley Maher
I installed postgresql-7.4.3 about three months ago.
Twice, with no known reason the postmaster has stopped.
The postmaster is not known for "just stopping". There will certainly
be some evidence either in the postmaster's log (if it aborted for some
internal reason) or in the kernel log (if you got hit by the Linux OOM
killer) or at worst in a core dump file left by the postmaster (though
I really seriously doubt that one).

Now it's entirely possible that you're starting the postmaster in a way
that destroys the evidence. Check to see if you are sending postmaster
stderr to /dev/null, and if so put it someplace more useful. Also make
sure the postmaster is not being launched under "ulimit -c 0", and that
it's being started in a directory that postgres is allowed to write (so
that a core file can be created, if worst comes to worst).

When you've got some evidence to look at, we'll be glad to help you
interpret it, but there's no point in speculating without evidence.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster
Ashley Maher
2004-10-19 03:14:10 UTC
Permalink
Post by Tom Lane
When you've got some evidence to look at, we'll be glad to help you
interpret it, but there's no point in speculating without evidence.
Thanks Tom,

My original post stated I'd had the postmaster stop without explanation.
I now have log files. I can see nothing in messaegs or syslog of interest.

I start postgresql with the script by Ryan Kirkpatrick (basis used by
the BSD script from the postgresql site). This logs to
/usr/local/pgsql/data/serverlog.

this is an extract:

LOG: unexpected EOF on client connection
LOG: unexpected EOF on client connection
LOG: unexpected EOF on client connection
LOG: unexpected EOF on client connection
FATAL: lock file "/usr/local/pgsql/data/postmaster.pid" already exists
HINT: Is another postmaster (PID 1427) running in data directory "/usr/local/pgsql/data"?
LOG: database system was interrupted at 2004-09-13 11:29:44 EST
LOG: checkpoint record is at 0/16982C0
LOG: redo record is at 0/16982C0; undo record is at 0/0; shutdown FALSE
LOG: next transaction ID: 37412; next OID: 54829
LOG: database system was not properly shut down; automatic recovery in progress
LOG: record with zero length at 0/1698300
LOG: redo is not required
LOG: database system is ready



As you can see I tried to restart the postmaster before removing the pid
file. There is alot of the "unexpected EOF on client connection". I've
done a search for the cause of this error without much success, in the
hope it is the reason the postmaster dies. Is this a hint to my problem?
As I said I've "googled" but not gotten very far.

Regards,

Ashley

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html
Andrew Sullivan
2004-10-19 15:28:46 UTC
Permalink
Post by Ashley Maher
My original post stated I'd had the postmaster stop without explanation.
I now have log files. I can see nothing in messaegs or syslog of interest.
LOG: unexpected EOF on client connection
LOG: unexpected EOF on client connection
LOG: unexpected EOF on client connection
LOG: unexpected EOF on client connection
These are cases where the client has gone away without shutting down
its connection properly. At the TCP timeout, the postmaster checks
the client, discovers it isn't there, and shuts down the connection.
So this tells you that a client disconnected abnormally, but it also
tells you that the postmaster was working correctly then.
Post by Ashley Maher
FATAL: lock file "/usr/local/pgsql/data/postmaster.pid" already exists
And this is your start up attempt.
Post by Ashley Maher
file. There is alot of the "unexpected EOF on client connection". I've
done a search for the cause of this error without much success, in the
hope it is the reason the postmaster dies. Is this a hint to my problem?
I doubt it very much. I think you probably need to increase the
verbosity of your logging in order to make it clearer what's going
on. Also, timestamps would help: from the output that's there, you
can't tell how long it was between the last known-alive postmaster
event (unexpected EOF) and the start-up attempt.

You'll probably want to send the output through a log rotator if you
increase the verbosity: multi-gig log files aren't so good.

A
--
Andrew Sullivan | ***@crankycanuck.ca
The plural of anecdote is not data.
--Roger Brinner

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings
Tom Lane
2004-10-19 15:51:21 UTC
Permalink
Post by Andrew Sullivan
Post by Ashley Maher
LOG: unexpected EOF on client connection
These are cases where the client has gone away without shutting down
its connection properly. At the TCP timeout, the postmaster checks
the client, discovers it isn't there, and shuts down the connection.
So this tells you that a client disconnected abnormally, but it also
tells you that the postmaster was working correctly then.
Strictly speaking, it's possible that the postmaster died earlier,
leaving orphan backends that logged these messages sometime later.

My private suspicion is that the postmaster is being killed either
because of exceeding a ulimit setting or by the infamous Linux "OOM
kill" kernel bug^H^H^Hfeature. In either case, the postmaster log
will not be the place to look; instead check for a core file left behind
or for a notice in the kernel log about OOM kill.

But really the first thing to do is to check that the postmaster is
being started under reasonable ulimit settings ... and while you are at
it, make sure it's "ulimit -c unlimited" not "ulimit -c 0". The latter
would prevent a core file from being dropped. If it's not an OOM kill
situation then the core file is the only way to learn more.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Loading...