Discussion:
corrupt DB recovery
(too old to reply)
Drew Whittle
2004-08-12 03:10:16 UTC
Permalink
Hi Folks,

One of our production db's went down the other day, I have restored it
from backup to another server and now I am attempting to recover as much
of the data that was entered since the last backup.

Anyway, if I attempt to connect to the DB using psql I get the old
missing pg_clog entry error, in an attempt to fix that I created a 256k
0 filled file of the correct name.

The next attempt to connect via psql gets me a "no pg_class entry for
pg_proc" error.

If I start the db up in standalone mode the above error comes up and the
db immediately shuts down again.

I have attempted to scan across the data files for this db with
pg_filedump and it is not reporting any corrupted blocks. (but I'm not
sure I was doing it correctly)

I have two other db's that are 100% identical in everything but name and
data. (ie same structure etc), is it possible to use one of that data
files in them to over write the buggered file and fix the pg_proc error?
(and if so, is there an easy way to figure out which one, ie is pg_class
always in the 1249 data file?)

Any other suggestions that might get the data back or should I just
forget it and tell them to key it again?

Thx,

Drew


---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match
Tom Lane
2004-08-12 04:21:58 UTC
Permalink
Post by Drew Whittle
I have two other db's that are 100% identical in everything but name and
data. (ie same structure etc), is it possible to use one of that data
files in them to over write the buggered file and fix the pg_proc error?
You could try that, but unless the db's also have identical history (in
the sense of the same set of transaction numbers that committed) it's
unlikely to work. I think your problem is not that pg_class is buggered
per se, so much as that you did not have correct pg_clog data to match it.
The data files and the log files are an interlocking whole, you can't
just plug in one part at random.
Post by Drew Whittle
Any other suggestions that might get the data back or should I just
forget it and tell them to key it again?
If you did extract something from this mess, how much could you trust
it? Not a lot, because the bad-clog problem will affect visibility of
rows in the user tables too.

I'm afraid you're screwed :-(. Maybe you could learn something about
proper backup procedures though. The backup you have obviously didn't
work well --- how did you take it exactly?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend
Drew Whittle
2004-08-12 04:43:01 UTC
Permalink
Post by Tom Lane
You could try that, but unless the db's also have identical history (in
the sense of the same set of transaction numbers that committed) it's
unlikely to work. I think your problem is not that pg_class is buggered
per se, so much as that you did not have correct pg_clog data to match it.
The data files and the log files are an interlocking whole, you can't
just plug in one part at random.
I know your the expert and everything, but we will have to see if I can
get the data back or not. (I like a good challenge!)

A bit more research has given me some of the info I need and I now have
the buggered db up and running (although currently in a failing state -
but I'm fairly confident that I can fix that.)
Post by Tom Lane
Post by Drew Whittle
Any other suggestions that might get the data back or should I just
forget it and tell them to key it again?
If you did extract something from this mess, how much could you trust
it? Not a lot, because the bad-clog problem will affect visibility of
rows in the user tables too.
Only if it was a bad-clog problem. Due to the nature of that data it is
fairly easy to tell any corrupt items.
Post by Tom Lane
I'm afraid you're screwed :-(. Maybe you could learn something about
proper backup procedures though. The backup you have obviously didn't
work well --- how did you take it exactly?
Actually the backup went fine, restored fine to the alternate machine,
I'm attempting to get the data back (if any) that was input between the
backup time and the crash time.

Thanks for you past and present comments (earlier posts by you to the
list have helped me with this problem)

Thx,

Drew


---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Loading...