How to do fast, reliable backups?

Discussion:

(too old to reply)

Chris Ruprecht

2004-03-05 21:37:42 UTC

Hi all,

I am wondering how you guys back up your databases. Say, I have a 20 GB
database, data and indexes. If I run pg_dump on this, it backs up the schema
and the data. When I have to restore this, I whould have to run this through
psql which would then re-build the indexes after it has inserted the records.
That's not really a feasable option as it takes too long.

If I back up the pgdata directory, I will get inconsistent data, as somebody
might update data in 2 tables while my backup is running, one change in a
table I have already backed up, one in a table I have not. The solution would
be to shut the database server down, do the backup and then start everything
back up. But that's not really an option in a 24/7 environment.

What I'd like to see is a transaction aware backup which backs up indexes as
well as data.

Any ideas?

Best regards,
Chris

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Silvana Di Martino

2004-03-06 13:39:28 UTC

Permalink

Post by Chris Ruprecht
I am wondering how you guys back up your databases. Say, I have a 20 GB
database, data and indexes. If I run pg_dump on this, it backs up the
schema and the data. When I have to restore this, I whould have to run this
through psql which would then re-build the indexes after it has inserted
the records. That's not really a feasable option as it takes too long.

When you have large databases and you need consistent backups, an interesting
solution is master-slave replication. Just use a second server of your LAN as
a slave replica of your main database server. Should you have any problem
with the main server, just switch the IP addresses of the master and slave
servers and you are up and running again. The switching takes a few
milliseconds and can even be performed automatically by a few specific
programs (look for "fault-tolerant linux systems" with google). While your
backup server serves your users, you have all the time to install a new hard
disk into the main server, restore your data (from the running slave server
or from a tape) and come back to service.

When you need to put your data into a tape or a CD-ROM, just stop the
master-slave mechanism, make a "static" backup (a pg_dump or a file system
copy) of your _slave_ server and restart the replication program. The
master-slave replication mechanism will take care to re-align the two servers
in a few minutes. This way, you never need to stop the main server.

Post by Chris Ruprecht
If I back up the pgdata directory, I will get inconsistent data, as
somebody might update data in 2 tables while my backup is running, one
change in a table I have already backed up, one in a table I have not. The
solution would be to shut the database server down, do the backup and then
start everything back up. But that's not really an option in a 24/7
environment.

Making a copy of your data directory while your server is running is
definitely a bad idea. Forget it. Backing up the file system used by a RDBMS
usually requires to put the server off-line and flush all the unwritten data
to the disk. This cannot be done in a 24/7 environment.

Post by Chris Ruprecht
What I'd like to see is a transaction aware backup which backs up indexes
as well as data.

Have a look at one of the master-slave replication systems available on the
net. For this task you do not need anything as sophisticated as a
multi-master replication system (Like the Bettina Kemme "PG-Replicator"
project). You could even be able to write down your own client program with
Perl o Python. The program just have to connect to your running master
server, read the relevant data and insert them into the database on the
slave server. Using a SERIAL primary key it is quite easy to spot which data
of a table have not been copied to the slave server yet. Using a timer, you
can define the time granularity of the replication mechanism that best fit
your need. Using transactions you can be sure to get just consistent data.
Reading and saving system tables, you can backup even the most hidden and
subtle things of your DB.

Beware that restoring your data is up to you, if you use a custom-made system.
Test the restore system before using your program in a production
environment.

Good luck.
-----------------------------------------
Alessandro Bottoni and Silvana Di Martino
***@interfree.it
***@tin.it

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Andrew Sullivan

2004-03-06 16:30:59 UTC

Permalink

Post by Silvana Di Martino
a slave replica of your main database server. Should you have any problem
with the main server, just switch the IP addresses of the master and slave
servers and you are up and running again. The switching takes a few
milliseconds and can even be performed automatically by a few specific
programs (look for "fault-tolerant linux systems" with google). While your

And what do you do about the records which made it to the master
database but which haven't been applied on the slave yet?

Asynchronous replication with automatic failover is _dangerous_,
because you can strand records. Many commercial systems do this with
what they describe as conflict-management; in truth, this usually
amounts to marking _both_ records as suspect and letting the DBA
decide what to do.

I'm not saying that master-slave systems are a bad idea, but you'd
better be aware of what you're exposing yourself to before
considering such hot-failover cases.

A

--
Andrew Sullivan | ***@crankycanuck.ca

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ***@postgresql.org

Silvana Di Martino

2004-03-07 08:54:07 UTC

Permalink

Post by Andrew Sullivan
Asynchronous replication with automatic failover is _dangerous_,
because you can strand records. Many commercial systems do this with
what they describe as conflict-management; in truth, this usually
amounts to marking _both_ records as suspect and letting the DBA
decide what to do.

Of course, that's right. Nevertheless, it is hard to imagine anything better
than master-slave replication as a backup/recovery mechanism. For example,
how more data would you loose during a file system copy?

Much of the overall reliability depends on the details of the replication and
of the failover mechanisms:
- you need to copy the master db to the slave at a good speed and with a high
frequency, in order to not leave behind any unsaved data. This mean to have
good hardware and a good network.
- the failover mechanism should be designed to flush all unsaved data before
shutting down the master and replace it with the slave, if possible. This is
often just matter to wait for the next step of replication.
- some kind of database "journaling" can help. Write down any request of data
change sent to the master before actually performing it. This should allow
you to spot any unsaved operation at restore-time.
- and so on....

Unfortunately, the only way to guarantee the system against any loss of data
should be to use a federation of server with a federation-wide concept of
transactions. That is: a transactions does not commit until all federated
servers have performed the required change to data. Bettina Kemme's
replication engine seems to evolve in this direction but, as long as I know,
no PostgreSQL replication system actually implement this mechanism yet.

See you.
-----------------------------------------
Alessandro Bottoni and Silvana Di Martino
***@interfree.it
***@tin.it

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Andrew Sullivan

2004-03-07 15:32:58 UTC

Permalink

Post by Silvana Di Martino
Of course, that's right. Nevertheless, it is hard to imagine anything better
than master-slave replication as a backup/recovery mechanism. For example,
how more data would you loose during a file system copy?

This actually depends on how the PITR mechanism is implemented. A
common approach is to do filesystem backup followed by log shipping
to multiple, redundant destinations. In this scenario, you only lose
what isn't in the logs, which in the case of Postgres means you only
lose uncommitted transactions.

A

--
Andrew Sullivan | ***@crankycanuck.ca
This work was visionary and imaginative, and goes to show that visionary
and imaginative work need not end up well.
--Dennis Ritchie

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ***@postgresql.org

Chris Ruprecht

2004-03-07 19:47:17 UTC

Permalink

Post by Silvana Di Martino
Making a copy of your data directory while your server is running is
definitely a bad idea. Forget it. Backing up the file system used by a RDBMS
usually requires to put the server off-line and flush all the
unwritten data
to the disk. This cannot be done in a 24/7 environment.

Post by Chris Ruprecht
What I'd like to see is a transaction aware backup which backs up indexes
as well as data.

Well, at work we're using a commercial database called Progress. It has
a very nice backup utility which creates a backup file as follows:

All current database blocks are backed up as they are at the time the
backup starts. Each DB block has a unique identifier. If a running
transactions wants to write to a block which has not jet been backed
up, the transaction is put on hold, the block is backed up and the
transaction can then continue.

If you restore a database which contains blocks of incomplete
transactions, these transactions are backed out when you start the DB
server (broker).

The DB can also (not done by default) maintain an after image log which
contains all transactions since the last backup. This log needs to be
kept on a drive, different from the DB. In case the DB drive crashes,
you can recover from your last backup and then roll forward the after
image log.

Is there any slight possibility, some mechanism can make it into
Progresql at some stage (maybe version 8)?

Best regards,
Chris

--
Chris Ruprecht
Network Grunt and Bit Pusher extraordinaíre

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match