Dan Hrabarchuk
2004-10-07 16:25:24 UTC
Is there a FAQ or a HOWTO on recovering from database corruption? If
not, then I am willing to write it. I will need help getting the
document together because as much as I google, I come up empty handed.
What I need is something that tells you:
A> You get errors in your server log like:
PANIC: open of /psql/pg_clog/0C95 failed: No such file or directory
- or -
PANIC: ZeroFill failed to write /psql/pg_xlog/xlogtemp.27517: No space
left on device :)
- or -
LOG: server process (pid 48146) was terminated by signal 6
- or -
LOG: ReadRecord: record with zero length at 39D/A53F7800
- or -
LOG: ReadRecord: unexpected pageaddr 39A/78D4000 in log file 922,
segment 15, offset 9256960
- or -
ERROR: Invalid page header in block 2879878 of channeldata
etc.
Which are serious? What needs to be done?
B> How to find all corrupt tuples, whether they are in indexes or the
tables themselves.
C> How to eliminate the corrupt tuples. Delete them of course. But how
when the server errors on deleting them??
pgfsck looks like an awesome tool. It allows you to search areas of your
database that are rarely used. So you don't get a 1AM phone call from
the sales guy needing some report for tomorrow at 7AM and he can't get
it.
I want to have something that can be easily digested and doesn't require
advanced knowledge of postgres. What does the beginning PostgreSQL admin
do? A database disaster guide.
Thanks
Dan Hrabarchuk
-----Forwarded Message-----
TIP 7: don't forget to increase your free space map settings
not, then I am willing to write it. I will need help getting the
document together because as much as I google, I come up empty handed.
What I need is something that tells you:
A> You get errors in your server log like:
PANIC: open of /psql/pg_clog/0C95 failed: No such file or directory
- or -
PANIC: ZeroFill failed to write /psql/pg_xlog/xlogtemp.27517: No space
left on device :)
- or -
LOG: server process (pid 48146) was terminated by signal 6
- or -
LOG: ReadRecord: record with zero length at 39D/A53F7800
- or -
LOG: ReadRecord: unexpected pageaddr 39A/78D4000 in log file 922,
segment 15, offset 9256960
- or -
ERROR: Invalid page header in block 2879878 of channeldata
etc.
Which are serious? What needs to be done?
B> How to find all corrupt tuples, whether they are in indexes or the
tables themselves.
C> How to eliminate the corrupt tuples. Delete them of course. But how
when the server errors on deleting them??
pgfsck looks like an awesome tool. It allows you to search areas of your
database that are rarely used. So you don't get a 1AM phone call from
the sales guy needing some report for tomorrow at 7AM and he can't get
it.
I want to have something that can be easily digested and doesn't require
advanced knowledge of postgres. What does the beginning PostgreSQL admin
do? A database disaster guide.
Thanks
Dan Hrabarchuk
-----Forwarded Message-----
Subject: Re: Question about pgfsck
Date: Thu, 07 Oct 2004 10:38:33 +0200
The problem is obviously with the structure of the tuples. So you need
to identify it by something then does not require parsing the tuples.
Hence the suggestion of ctid or oid or something else fixed.
Good luck,
SELECT * FROM channeldata where cd_id=6268 and tstamp<???2004-09-20??? and
tstamp>???2004-09-15???;
Will cause a signal 6 and have the database backend restart.
If I change it to: DELETE FROM channeldata where cd_id=6268 and
tstamp<???2004-09-20??? and tstamp>???2004-09-15???;
I get an error and the transaction does not commit. How do you delete
the records? Or is this not the usual way.
I can recover the data after it's deleted. I just need to delete it.
Thanks again.
Dan
---------------------------(end of broadcast)---------------------------Date: Thu, 07 Oct 2004 10:38:33 +0200
The problem is obviously with the structure of the tuples. So you need
to identify it by something then does not require parsing the tuples.
Hence the suggestion of ctid or oid or something else fixed.
Good luck,
3. Finding the file with the error.
And the tablename, ergo, filename4. Finally fixing the error.
Delete the record is the usual way.tstamp>???2004-09-15???;
Will cause a signal 6 and have the database backend restart.
If I change it to: DELETE FROM channeldata where cd_id=6268 and
tstamp<???2004-09-20??? and tstamp>???2004-09-15???;
I get an error and the transaction does not commit. How do you delete
the records? Or is this not the usual way.
I can recover the data after it's deleted. I just need to delete it.
Thanks again.
Dan
TIP 7: don't forget to increase your free space map settings