Invalid page header

Discussion:

Invalid page header

(too old to reply)

Ian Burrell

2004-07-22 17:29:48 UTC

I get the following error message when doing a select on a table:

ERROR: invalid page header in block 295 of relation "reported_titles"

I found some messages that said this means a block of this table is
corrupt. I found some suspicious lines in the server log just before:

ERROR: could not access status of transaction 3651584
DETAIL: could not open file "/usr/local/pgsql/data/pg_clog/0003": No
such file or directory

How do I fix this corruption? I have dumped as much of the databases as
I can including about half of this table. Only this table is corrupt.

What could cause the corruption? We are using custom C code. Could
bugs in this be causing it? Or is hardware problems more likely?

- Ian

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Tom Lane

2004-07-22 18:24:07 UTC

Permalink

Post by Ian Burrell
ERROR: invalid page header in block 295 of relation "reported_titles"
How do I fix this corruption?

You can zap just the failed block by turning on "zero_damaged_pages";
that will at least allow you to recover the rest of the table. If you
want to try harder, you could look at the damaged page with pg_filedump
(http://sources.redhat.com/rhdb/) or a similar tool and try to intuit
how to fix it manually.

Post by Ian Burrell
What could cause the corruption? We are using custom C code. Could
bugs in this be causing it? Or is hardware problems more likely?

Hmm. A scribble-on-memory kind of bug could cause this, but in my
experience it's unusual for coding errors to trash the disk buffers ---
that's a relatively small part of your address space, and usually a
memory clobber will crash the backend elsewhere before it hits a disk
buffer. (BTW, one reason we force a database restart after a backend
crash is in hopes of not letting any such clobber make it to disk. The
contents of shared disk buffers are simply thrown away in a restart.)

It would probably be worth your while to look at the damaged page with
pg_filedump before you zap it. The symptoms of hardware misfeasance and
software errors are enough different that you can often tell which
theory to believe by examining the bits.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Ian Burrell

2004-07-22 22:10:41 UTC

Permalink

Post by Tom Lane
You can zap just the failed block by turning on "zero_damaged_pages";
that will at least allow you to recover the rest of the table. If you
want to try harder, you could look at the damaged page with pg_filedump
(http://sources.redhat.com/rhdb/) or a similar tool and try to intuit
how to fix it manually.

I zapped the damaged block. It didn't seem to effect the rows in the
table. My suspicion is that the page only contained deleted rows since
the table had many updates done recently.

Post by Tom Lane
Hmm. A scribble-on-memory kind of bug could cause this, but in my
experience it's unusual for coding errors to trash the disk buffers ---
that's a relatively small part of your address space, and usually a
memory clobber will crash the backend elsewhere before it hits a disk
buffer. (BTW, one reason we force a database restart after a backend
crash is in hopes of not letting any such clobber make it to disk. The
contents of shared disk buffers are simply thrown away in a restart.)
It would probably be worth your while to look at the damaged page with
pg_filedump before you zap it. The symptoms of hardware misfeasance and
software errors are enough different that you can often tell which
theory to believe by examining the bits.

I used pg_filedump on a backup of the database files. The block looks
like it is mostly zero bytes with a few x02 bytes thrown to just be
confusing.

- Ian

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Tom Lane

2004-07-22 22:30:20 UTC

Permalink

Post by Ian Burrell

Post by Tom Lane
It would probably be worth your while to look at the damaged page with
pg_filedump before you zap it. The symptoms of hardware misfeasance and
software errors are enough different that you can often tell which
theory to believe by examining the bits.

I used pg_filedump on a backup of the database files. The block looks
like it is mostly zero bytes with a few x02 bytes thrown to just be
confusing.

My interpretation of that would be a hardware glitch. A software
problem would be more likely to look like copying the wrong data
into the block, or possibly zeroing out the block when it shouldn't
--- but the sprinkling of x02's rules out a misaimed memset().

Time to break out the RAM and disk test programs ...

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html