Vacuum full - disk space eaten by WAL logfiles

Thomas Madsen

2004-09-28 13:12:25 UTC

Hi!

I ran in to a serious problem with a server running Postgresql 7.2.1. I
know that an upgrade would be in its place, but this is not an option at
the moment.

The problem started with the server running out of space (toast table
galore) due to a too low max_fsm_pages number (1,000,000 was too low on
this particular server).
I raised it to 3,000,000 and tried a "VACUUM FULL" on the table. Out
popped an error when it reached the toast table index: "Number of tuples
(xxx) not same at number in heap (XXX), recreate index". The numbers
(XXX) differed by just 2. It then started recreating the index, but
stopped short when another error popped up: "insertion of duplicate
keys"-failure.

I have subsequently attempted to reindex the tables/indexes, but it
didn't help a bit. I still got the: "Number of tuples (xxx) not same at
number in heap (XXX), recreate index"- error.

I then tried to drop the indexes on the table in question, and then run
another "VACUUM FULL". The result is this:
SNIP
------------------------------------------------------------------------
------------------------------
backend> vacuum full infected;
DEBUG: --Relation infected--
DEBUG: Pages 71795: Changed 0, reaped 16560, Empty 0, New 0; Tup
748830: Vac 0, Keep/VTL 0/0, UnUsed 81433, MinLen 142, MaxLen 2032;
Re-using: Free/Avail. Space 13279172/10747696; EndEmpty/Avail. Pages
0/21243.
CPU 2.05s/0.24u sec elapsed 69.52 sec.
DEBUG: XLogWrite: new log file created - consider increasing WAL_FILES
DEBUG: Rel infected: Pages: 71795 --> 71795; Tuple(s) moved: 0.
CPU 1.37s/1.23u sec elapsed 112.46 sec.
DEBUG: --Relation pg_toast_16581--
DEBUG: Pages 1517406: Changed 48938, reaped 1134683, Empty 217179, New
0; Tup 1598162: Vac 201754, Keep/VTL 0/0, UnUsed 4053916, MinLen 45,
MaxLen 2034; Re-using: Free/Avail. Space 9420709796/9419556500;
EndEmpty/Avail. Pages 0/1354671.
CPU 38.72s/5.26u sec elapsed 850.95 sec.
DEBUG: XLogWrite: new log file created - consider increasing WAL_FILES
DEBUG: XLogWrite: new log file created - consider increasing WAL_FILES
DEBUG: XLogWrite: new log file created - consider increasing WAL_FILES
DEBUG: XLogWrite: new log file created - consider increasing WAL_FILES
DEBUG: XLogWrite: new log file created - consider increasing WAL_FILES
DEBUG: XLogWrite: new log file created - consider increasing WAL_FILES
DEBUG: XLogWrite: new log file created - consider increasing WAL_FILES
....
...
...
DEBUG: XLogWrite: new log file created - consider increasing WAL_FILES
FATAL 2: ZeroFill failed to write
/var/lib/pgsql/data//pg_xlog/xlogtemp.25146: No space left on device
DEBUG: shutting down
FATAL 2: open of /var/lib/pgsql/data//pg_xlog/00000023000000FE (log
file 35, segment 254) failed: No such file or directory
------------------------------------------------------------------------
---------------------------------------------/SNIP

The last lines keep coming until no more diskspace is left. I searched
this forum and got some hints to this problem.
I raised the "checkpoint" default values in postgres.conf to these
values, which should have decreased the need for diskspace:
-------------
checkpoint_segments = 12 # in logfile segments (16MB each), min 1
checkpoint_timeout = 3600 # in seconds, range 30-3600
-------------
But the log files keep coming. And my original problem of vacuuming full
to reclaim space is unsolved....

I then moved pg_xlog to a different partition with 2 gigs of diskspace,
and raised wal_files further from 24 to 46. Ran "VACUUM FULL" once more.
Those 2 gigs was also used up pretty quickly...

So this is where i am now... desperate! Any suggestions will be greatly
appriciated!!!
What can i do to fix this mess??

Regards,
Thomas.