Discussion:
Vacuum full - disk space eaten by WAL logfiles
(too old to reply)
Thomas Madsen
2004-09-28 13:12:25 UTC
Permalink
Hi!

I ran in to a serious problem with a server running Postgresql 7.2.1. I
know that an upgrade would be in its place, but this is not an option at
the moment.

The problem started with the server running out of space (toast table
galore) due to a too low max_fsm_pages number (1,000,000 was too low on
this particular server).
I raised it to 3,000,000 and tried a "VACUUM FULL" on the table. Out
popped an error when it reached the toast table index: "Number of tuples
(xxx) not same at number in heap (XXX), recreate index". The numbers
(XXX) differed by just 2. It then started recreating the index, but
stopped short when another error popped up: "insertion of duplicate
keys"-failure.

I have subsequently attempted to reindex the tables/indexes, but it
didn't help a bit. I still got the: "Number of tuples (xxx) not same at
number in heap (XXX), recreate index"- error.

I then tried to drop the indexes on the table in question, and then run
another "VACUUM FULL". The result is this:
SNIP
------------------------------------------------------------------------
------------------------------
backend> vacuum full infected;
DEBUG: --Relation infected--
DEBUG: Pages 71795: Changed 0, reaped 16560, Empty 0, New 0; Tup
748830: Vac 0, Keep/VTL 0/0, UnUsed 81433, MinLen 142, MaxLen 2032;
Re-using: Free/Avail. Space 13279172/10747696; EndEmpty/Avail. Pages
0/21243.
CPU 2.05s/0.24u sec elapsed 69.52 sec.
DEBUG: XLogWrite: new log file created - consider increasing WAL_FILES
DEBUG: Rel infected: Pages: 71795 --> 71795; Tuple(s) moved: 0.
CPU 1.37s/1.23u sec elapsed 112.46 sec.
DEBUG: --Relation pg_toast_16581--
DEBUG: Pages 1517406: Changed 48938, reaped 1134683, Empty 217179, New
0; Tup 1598162: Vac 201754, Keep/VTL 0/0, UnUsed 4053916, MinLen 45,
MaxLen 2034; Re-using: Free/Avail. Space 9420709796/9419556500;
EndEmpty/Avail. Pages 0/1354671.
CPU 38.72s/5.26u sec elapsed 850.95 sec.
DEBUG: XLogWrite: new log file created - consider increasing WAL_FILES
DEBUG: XLogWrite: new log file created - consider increasing WAL_FILES
DEBUG: XLogWrite: new log file created - consider increasing WAL_FILES
DEBUG: XLogWrite: new log file created - consider increasing WAL_FILES
DEBUG: XLogWrite: new log file created - consider increasing WAL_FILES
DEBUG: XLogWrite: new log file created - consider increasing WAL_FILES
DEBUG: XLogWrite: new log file created - consider increasing WAL_FILES
....
...
...
DEBUG: XLogWrite: new log file created - consider increasing WAL_FILES
FATAL 2: ZeroFill failed to write
/var/lib/pgsql/data//pg_xlog/xlogtemp.25146: No space left on device
DEBUG: shutting down
FATAL 2: open of /var/lib/pgsql/data//pg_xlog/00000023000000FE (log
file 35, segment 254) failed: No such file or directory
------------------------------------------------------------------------
---------------------------------------------/SNIP

The last lines keep coming until no more diskspace is left. I searched
this forum and got some hints to this problem.
I raised the "checkpoint" default values in postgres.conf to these
values, which should have decreased the need for diskspace:
-------------
checkpoint_segments = 12 # in logfile segments (16MB each), min 1
checkpoint_timeout = 3600 # in seconds, range 30-3600
-------------
But the log files keep coming. And my original problem of vacuuming full
to reclaim space is unsolved....

I then moved pg_xlog to a different partition with 2 gigs of diskspace,
and raised wal_files further from 24 to 46. Ran "VACUUM FULL" once more.
Those 2 gigs was also used up pretty quickly...

So this is where i am now... desperate! Any suggestions will be greatly
appriciated!!!
What can i do to fix this mess??

Regards,
Thomas.
Tom Lane
2004-09-28 14:45:31 UTC
Permalink
Post by Thomas Madsen
I raised it to 3,000,000 and tried a "VACUUM FULL" on the table. Out
popped an error when it reached the toast table index: "Number of tuples
(xxx) not same at number in heap (XXX), recreate index". The numbers
(XXX) differed by just 2. It then started recreating the index, but
stopped short when another error popped up: "insertion of duplicate
keys"-failure.
VACUUM FULL will not try to "recreate the index", so you misinterpreted
what you were seeing.
Post by Thomas Madsen
I have subsequently attempted to reindex the tables/indexes, but it
didn't help a bit. I still got the: "Number of tuples (xxx) not same at
number in heap (XXX), recreate index"- error.
I have some vague memory that it's difficult to get the system to accept
a REINDEX on a toast table or toast index in older versions. REINDEXing
the base table will *not* fix it, you need to specify the toast object
by name. Does it let you do that?
Post by Thomas Madsen
The last lines keep coming until no more diskspace is left. I searched
this forum and got some hints to this problem.
I raised the "checkpoint" default values in postgres.conf to these
It's checkpoints that are responsible for recycling WAL files.
It sounds to me like checkpoints just aren't happening at all, which
seems odd. Does a checkpoint subprocess appear while this is going on?

It might well be that your best bet is to dump and reload that
particular table. You don't necessarily have to use pg_dump, you
could do something like
CREATE TABLE dup AS SELECT * from badtable;
TRUNCATE badtable;
INSERT INTO badtable SELECT * FROM dup;
DROP TABLE dup;
(7.2 had TRUNCATE didn't it? I forget.)

In any case I would *strongly* recommend an update to 7.2.5, which you
can do in-place without a dump/reload. We do not make bugfix releases
on a whim --- there are some pretty serious bugs fixed between 7.2.1 and
7.2.5.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Loading...