replication using WAL archives

Discussion:

(too old to reply)

Iain

2004-10-21 06:44:21 UTC

Hi,

I thought I read something about this in relation to v8, but I can't find
any reference to it now... is it (or will it be) possible to do master-slave
style database replication by transmitting log files to the standby server
and having it process them?

Also, just out of interest, if I wanted to mine my log files so as to be
able to produce SQL would that be feasible? Presumably pg does something
like this internally for rollforward recovery, so I guess it would be
possible.

regards
Iain

Robert Treat

2004-10-21 18:41:21 UTC

Permalink

Post by Iain
Hi,
I thought I read something about this in relation to v8, but I can't
find any reference to it now... is it (or will it be) possible to do
master-slave style database replication by transmitting log files to the
standby server and having it process them?

I'm not certain if this is 8.0, but some folks have created a working
version against the 8.0 code that will do something like this. Search
the pgsql-hacker mail list archives for more information on it.

Post by Iain
Also, just out of interest, if I wanted to mine my log files so as to be
able to produce SQL would that be feasible? Presumably pg does something
like this internally for rollforward recovery, so I guess it would be
possible.

If you want real SQL, you'll need to turn on SQL logging in
postgresql.conf.

Robert Treat

--
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Gaetano Mendola

2004-10-21 21:49:35 UTC

Permalink

Post by Robert Treat

I sent a post on hackers, I put it here:

=======================================================================
Hi all,
I seen that Eric Kerin did the work suggested by Tom about
how to use the PITR in order to have an hot spare postgres,
writing a C program.

I did the same writing 2 shell scripts, one of them perform
the restore the other one deliver the partial filled wal and
check if the postmaster is alive ( check if the pid process
still exist ).

With these two scripts I'm able to have an hot spare installation,
and the spare one go alive when the first postmaster dies.

How test it:

1) Master node:
modify postgresql.conf using:

~ archive_command = 'cp %p /mnt/server/archivedir/%f'

~ launch postgres and perform a backup as doc

~ http://developer.postgresql.org/docs/postgres/backup-online.html

suggest to do

launch the script:

partial_wal_deliver.sh <PID> /mnt/server/partialdir <pg_xlog path>

~ this script will delivery each 10 seconds the "current" wal file,
~ and touch the "alive" file in order to notify the spare node that
~ the master node is up and running

2) Spare node:
create a recovery.conf with the line:

~ restore_command = 'restore.sh /mnt/server/archivedir/%f %p /mnt/server/partialdir'

~ replace the content of data directory with the backup performed at point 1,
~ remove any file present in the pg_xlog directory ( leaving there the archive_status
~ directory ) and remove the postmaster.pid file ( this is necessary if you are running
~ the spare postgres on the same hw ).

~ launch the postmaster, the restore will continue till the "alive" file present in the
~ /mnt/server/partialdir directory is not updated for 60 seconds ( you can modify this
~ values inside the restore.sh script ).

Be sure that restore.sh and all directories involved are accessible

Let me know.

This is a first step, of course, as Eric Kerin did, is better port these script
in C and make it more robust.

Postgres can help this process, as suggested by Tom creating a pg_current_wal()
or even better having two new GUC parameters: archive_current_wal_command and
archive_current_wal_delay.

I problem I discover during the tests is that if you shut down the spare node
and the restore_command is still waiting for a file then the postmaster will never
exit :-(
==========================================================================

I hope that is clear.

Regards
Gaetano Mendola

Iain

2004-10-22 01:52:09 UTC

Permalink

Thanks Gaetano, and Robert.

I'm not in the habit of reading the hackers list, so I still have no idea
where I first read about it :)

Anyway, I did some reading and it seems like there are good possibilities
here, but still quite a bit of work to do. I'll add this to my list of HA
options and keep an eye on it's progress.

regards
Iain
----- Original Message -----
From: "Gaetano Mendola" <***@bigfoot.com>
To: "Robert Treat" <***@users.sourceforge.net>;
<pgsql-***@postgresql.org>
Cc: <***@mst.co.jp>
Sent: Friday, October 22, 2004 6:49 AM
Subject: Re: [ADMIN] replication using WAL archives

Post by Gaetano Mendola

Post by Robert Treat

=======================================================================
Hi all,
I seen that Eric Kerin did the work suggested by Tom about
how to use the PITR in order to have an hot spare postgres,
writing a C program.
I did the same writing 2 shell scripts, one of them perform
the restore the other one deliver the partial filled wal and
check if the postmaster is alive ( check if the pid process
still exist ).
With these two scripts I'm able to have an hot spare installation,
and the spare one go alive when the first postmaster dies.
~ archive_command = 'cp %p /mnt/server/archivedir/%f'
~ launch postgres and perform a backup as doc
~ http://developer.postgresql.org/docs/postgres/backup-online.html
suggest to do
partial_wal_deliver.sh <PID> /mnt/server/partialdir <pg_xlog path>
~ this script will delivery each 10 seconds the "current" wal file,
~ and touch the "alive" file in order to notify the spare node that
~ the master node is up and running
~ restore_command = 'restore.sh /mnt/server/archivedir/%f %p /mnt/server/partialdir'
~ replace the content of data directory with the backup performed at point 1,
~ remove any file present in the pg_xlog directory ( leaving there
the archive_status
~ directory ) and remove the postmaster.pid file ( this is
necessary if you are running
~ the spare postgres on the same hw ).
~ launch the postmaster, the restore will continue till the "alive"
file present in the
~ /mnt/server/partialdir directory is not updated for 60 seconds
( you can modify this
~ values inside the restore.sh script ).
Be sure that restore.sh and all directories involved are accessible
Let me know.
This is a first step, of course, as Eric Kerin did, is better port these script
in C and make it more robust.
Postgres can help this process, as suggested by Tom creating a
pg_current_wal()
or even better having two new GUC parameters: archive_current_wal_command and
archive_current_wal_delay.
I problem I discover during the tests is that if you shut down the spare node
and the restore_command is still waiting for a file then the postmaster will never
exit :-(
==========================================================================
I hope that is clear.
Regards
Gaetano Mendola

--------------------------------------------------------------------------------

Post by Gaetano Mendola
#!/bin/bash
SOURCE=$1
TARGET=$2
PARTIAL=$3
SIZE_EXPECTED=16777216 #bytes 16 MB
DIED_TIME=60 #seconds
function test_existence
{
if [ -f ${SOURCE} ]
then
COUNTER=0
#I have to check if the file is begin copied
#I assume that it will reach the right
#size in a few seconds
while [ $(stat -c '%s' ${SOURCE} ) -lt $SIZE_EXPECTED ]
do
sleep 1
let COUNTER+=1
if [ 20 -lt $COUNTER ]
then
exit 1 # BAILING OUT
fi
done
cp $SOURCE $TARGET
exit 0
fi
echo ${SOURCE}"> not found"
#if is looking for a history file and not exist
#I have suddenly exit
echo $SOURCE | grep history > /dev/null 2>&1 && exit 1
}
while [ 1 ]
do
test_existence
#CHECK IF THE MASTER IS ALIVE
DELTA_TIME=$(( $( date +'%s' ) - $( stat -c '%Z' ${PARTIAL}/alive ) ))
if [ $DIED_TIME -lt $DELTA_TIME ]
then
echo "Master is dead..."
# Master is dead
CURRENT_WAL=$( basename $SOURCE )
echo "Partial: " ${PARTIAL}
echo "Current wal: " ${CURRENT_WAL}
echo "Target: " ${TARGET}
cp ${PARTIAL}/${CURRENT_WAL}.partial ${TARGET} > /dev/null 2>&1 &&
exit 0
exit 1
fi
sleep 1
done

--------------------------------------------------------------------------------

Post by Gaetano Mendola
#!/bin/bash
PID=$1
PARTIAL=$2
PGXLOG=$3
function copy_last_wal
{
FILE=$( ls -t1p $PGXLOG | grep -v / | head -1 )
echo "Last Wal> " $FILE
cp ${PGXLOG}/${FILE} ${PARTIAL}/${FILE}.tmp
mv ${PARTIAL}/${FILE}.tmp ${PARTIAL}/${FILE}.partial
find ${PARTIAL} -name *.partial | grep -v ${FILE} | xargs -i rm -fr {}
}
while [ 1 ]
do
ps --pid $PID > /dev/null 2>&1
ALIVE=$?
if [ "${ALIVE}" == "1" ]
then
#The process is dead
echo "Process dead"
copy_last_wal
exit 1
fi
#The process still exist
touch ${PARTIAL}/alive
copy_last_wal
sleep 10
done

--------------------------------------------------------------------------------

Post by Gaetano Mendola
---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?
http://www.postgresql.org/docs/faqs/FAQ.html

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Simon Riggs

2004-10-22 09:22:54 UTC

Permalink

Post by Gaetano Mendola
Postgres can help this process, as suggested by Tom creating a

pg_current_wal()

Post by Gaetano Mendola
or even better having two new GUC parameters: archive_current_wal_command and
archive_current_wal_delay.

OK, we can modify the archiver to do this as well as the archive-when-full
functionality. I'd already agreed to do something similar for 8.1

PROPOSAL:
By default, archive_max_delay would be 10 seconds.
By default, archive_current_wal_command is not set.
If archive_current_wal_command is not set, the archiver will archive a file
using archive_command only when the file is full.
If archive_current_wal_command is set, the archiver would archive a file
whichever of these occurs first...
- it is full
- the archive_max_delay timeout occurs (default: disabled)
...as you can see I've renamed archive_current_wal_delay to reflect the fact
that there is an interaction between the current mechanism (only when full)
and this additional mechanism (no longer than X secs between log files).
With that design, if the logs are being created quickly enough, then a
partial log file is never created, only full ones.

When an xlog file is archived because it is full, then it is sent to both
archive_current_wal_command and archive_command (in that order). When the
timeout occurs and we have a partial xlog file, it would only be sent to
archive_current_wal_command. It may also be desirable to not use
archive_command at all, only to use archive_current_wal_command. That's not
currently possible because archive_command is the switch by which all of the
archive functioanlity is enabled, so you can't actually turn this off.

There is already a timeout feature designed into archiver for safety...so I
can make that read the GUCs, above and act accordingly.

There is an unresolved resilience issue: if the archiver goes down (or
whatever does the partial_wal copy functionality) then it it is possible
that users will continue writing to the database and creating xlog records.
It would be up to the user to define how to handle records that had been
committed to the first database in the interim before cutover. It would also
be up to the user to shut down the first node from the second - Shoot the
Other Node in the Head, as its known. All of that is up to the second node,
and as Tom says, is "the hard part"....I'm not proposing to do anything
about that at this stage, since it is implementation dependant.

I was thinking perhaps to move to having variable size xlog files, since
their contents are now variable - no padded records at EOF. If we did that,
then the archiver could simply issue a "switch logfile" and then the
archiver would cut in anyway to copy away the xlog. Having said that it is
lots easier just to put a blind timeout in the archiver and copy the file -
though I'm fairly uneasy about the point that we'd be ignoring the fact that
many people are still writing to it. But I propose doing the easy way....

Thoughts?

= - = - =

Gaetano - skim-reading your script, how do you handle the situation when a
new xlog file has been written within 10 seconds? That way the current file
number will have jumped by 2, so when your script looks for the "Last wal"
using head -1 it will find the N+2 and the intermediate file will never be
copied. Looks like a problem to me...

Post by Gaetano Mendola
I problem I discover during the tests is that if you shut down the spare node
and the restore_command is still waiting for a file then the postmaster will never
exit :-(

Hmm....Are you reporting this as a bug for 8.0? It's not on the bug list...

Do we consider that to be desirable or not?

Best Regards, Simon Riggs

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ***@postgresql.org)

Gaetano Mendola

2004-10-22 16:44:59 UTC

Permalink

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Simon Riggs wrote:
|>Gaetano Mendola wrote
|>Postgres can help this process, as suggested by Tom creating a
|
| pg_current_wal()
|
|>or even better having two new GUC parameters: archive_current_wal_command
|
| and
|
|>archive_current_wal_delay.
|
|
| OK, we can modify the archiver to do this as well as the archive-when-full
| functionality. I'd already agreed to do something similar for 8.1
|
| PROPOSAL:
| By default, archive_max_delay would be 10 seconds.
| By default, archive_current_wal_command is not set.
| If archive_current_wal_command is not set, the archiver will archive a file
| using archive_command only when the file is full.
| If archive_current_wal_command is set, the archiver would archive a file
| whichever of these occurs first...
| - it is full
| - the archive_max_delay timeout occurs (default: disabled)
| ...as you can see I've renamed archive_current_wal_delay to reflect the fact
| that there is an interaction between the current mechanism (only when full)
| and this additional mechanism (no longer than X secs between log files).
| With that design, if the logs are being created quickly enough, then a
| partial log file is never created, only full ones.
|
| When an xlog file is archived because it is full, then it is sent to both
| archive_current_wal_command and archive_command (in that order). When the
| timeout occurs and we have a partial xlog file, it would only be sent to
| archive_current_wal_command. It may also be desirable to not use
| archive_command at all, only to use archive_current_wal_command. That's not
| currently possible because archive_command is the switch by which all of the
| archive functioanlity is enabled, so you can't actually turn this off.

It seems good to me, the script behind archive command can be a nop if someone
want use the archive_current_wal_command

| = - = - =
|
| Gaetano - skim-reading your script, how do you handle the situation when a
| new xlog file has been written within 10 seconds? That way the current file
| number will have jumped by 2, so when your script looks for the "Last wal"
| using head -1 it will find the N+2 and the intermediate file will never be
| copied. Looks like a problem to me...

Yes, the only window failure I seen ( but I don't know if it's possible )

Master:
~ log N created
log N filled
archive log N
log N+1 created
log N+1 filled
~ log N+2 created
~ <---- the master die here before to archive the log N+1
~ archive log N+1

in this case as you underline tha last log archived is the N and the N+2
partial wal is added to archived wal collection and in the recovery fase
the recovery stop after processing the log N.

Is it possible that the postmaster create the N+2 file without finish to archive
the N+1 ? ( I suspect yes :-( )

The only cure I see here is to look for not archived WAL ( if possible ).

|>I problem I discover during the tests is that if you shut down the spare
|>node and the restore_command is still waiting for a file then the postmaster
|>will never exit :-(
|
|
| Hmm....Are you reporting this as a bug for 8.0? It's not on the bug list...

For me is a behave to avoid.

Regards
Gaetano Mendola

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (MingW32)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFBeTkJ7UpzwH2SGd4RAsMxAKCbV7W+wrGBocf2Ftlthm0egAlIWACgp87L
KU/YusyHuvT7jSFwZVKpP3M=
=rWZx
-----END PGP SIGNATURE-----

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ***@postgresql.org

Simon Riggs

2004-10-22 17:29:10 UTC

Permalink

Post by Gaetano Mendola
| Gaetano - skim-reading your script, how do you handle the situation when a
| new xlog file has been written within 10 seconds? That way the current file
| number will have jumped by 2, so when your script looks for the "Last wal"
| using head -1 it will find the N+2 and the intermediate file will never be
| copied. Looks like a problem to me...
Yes, the only window failure I seen ( but I don't know if it's possible )
~ log N created
log N filled
archive log N
log N+1 created
log N+1 filled
~ log N+2 created
~ <---- the master die here before to archive the log N+1
~ archive log N+1
in this case as you underline tha last log archived is the N and the N+2
partial wal is added to archived wal collection and in the recovery fase
the recovery stop after processing the log N.
Is it possible that the postmaster create the N+2 file without finish to archive
the N+1 ? ( I suspect yes :-( )
The only cure I see here is to look for not archived WAL ( if possible ).

Hmm...well you aren't looking for archived wal, you're just looking at
wal...which is a different thing...

Situation I thought I saw was:

- copy away current partial filled xlog N
- xlog N fills, N+1 starts
- xlog N+1 fills, N+2 starts
- copy away current partial filled xlog: N+2 (+10 secs later)

i.e. if time to fill xlog (is ever) < time to copy away current xlog,
then you miss one.

So problem: you can miss one and never know you've missed one until the
recovery can't find it, which it never returns from...so it just hangs.

[Just so we're all clear: we're talking about Gaetano's script, not the
PostgreSQL archver. The postgresql archiver doesn't do it that way, so
it never misses one.]

--
Best Regards, Simon Riggs

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Gaetano Mendola

2004-10-22 20:50:34 UTC

Permalink

Post by Simon Riggs
- copy away current partial filled xlog N
- xlog N fills, N+1 starts
- xlog N+1 fills, N+2 starts
- copy away current partial filled xlog: N+2 (+10 secs later)
i.e. if time to fill xlog (is ever) < time to copy away current xlog,
then you miss one.
So problem: you can miss one and never know you've missed one until the
recovery can't find it, which it never returns from...so it just hangs.

No. The restore.sh is not smart enough to know the last wal that must be
replayed, the only "smart thing" is to copy the supposed "current wal" in the
archive directory.

The script hang (and is a feature not a bug) if and only if the master is alive
( at least I'm not seeing any other hang ).

In your example in the archived directory will be present the files until logN
and logN+2 ( the current wal ) is in the partial directory, if the master die,
the restore.sh will copy logN+2 in the archived directory, the spare node will
execute restore.sh with file logN+1 as argument and if is not found then the
restore.sh will exit.

Regards
Gaetano Mendola