Discussion:
pg_restore TODO - delay PK creation
(too old to reply)
Iain
2004-11-01 02:22:31 UTC
Permalink
Hi,

I'm wondering if this is already on some todo list for pg_restore but I
didn't find any mention of it anywhere, so I thought I should post this and
see what people think..

Basically, I'd like to see an option at restore time to not include the
primary key constraint when issuing the create table command. I'd like the
PK to be added after data has been loaded using an ALTER command.

The principle reason for this is performance.

There may also be a bug somewhere, or perhaps just a problem with my system,
but I was trying to restore a fairly large table (over 7000000 rows) which
would run for a couple hours before failing. Dropping the PK enabled the
load to complete in 3 or 4 minutes. Adding the PK took another 3 or 4
minutes which adds up to quite a difference.

regards
Iain
Bruce Momjian
2004-11-01 02:36:06 UTC
Permalink
Post by Iain
Hi,
I'm wondering if this is already on some todo list for pg_restore but I
didn't find any mention of it anywhere, so I thought I should post this and
see what people think..
Basically, I'd like to see an option at restore time to not include the
primary key constraint when issuing the create table command. I'd like the
PK to be added after data has been loaded using an ALTER command.
The principle reason for this is performance.
There may also be a bug somewhere, or perhaps just a problem with my system,
but I was trying to restore a fairly large table (over 7000000 rows) which
would run for a couple hours before failing. Dropping the PK enabled the
load to complete in 3 or 4 minutes. Adding the PK took another 3 or 4
minutes which adds up to quite a difference.
I don't know what PostgreSQL version you have but we currently do what
you suggest and I think have been doing it for a few releases now:

---------------------------------------------------------------------------

--
-- Name: test; Type: TABLE; Schema: public; Owner: postgres
--

CREATE TABLE test (
x integer NOT NULL
);


ALTER TABLE public.test OWNER TO postgres;

--
-- Data for Name: test; Type: TABLE DATA; Schema: public; Owner: postgres
--

COPY test (x) FROM stdin;
1
\.


--
-- Name: test_pkey; Type: CONSTRAINT; Schema: public; Owner: postgres
--

ALTER TABLE ONLY test
ADD CONSTRAINT test_pkey PRIMARY KEY (x);
--
Bruce Momjian | http://candle.pha.pa.us
***@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ***@postgresql.org
Iain
2004-11-01 03:11:00 UTC
Permalink
Hi Bruce,

OK, I've just looked into it and you are right, thanks.

In the case I just tested I restored a 7.1 dump to a 7.4.6 db. I had assumed
that this is a restore time issue but in fact it is dependent on the format
of the dump file. I noticed the problem after the data load failed and I
checked the description of the table in question, it had a primary key.

Worth noting for anybody preparing to upgrade large databases from old
versions. This db takes 8 hours to restore on 7.1 but I just dumped it and
restored it in less than 20 minutes total on 7.4 :)

I havn't used 7.4 for dumping/restoring for a long time so I had forgotten
how it worked. We're just starting a redevelopment that will include an
upgrade of the production db so I'm looking forward to working with 7.4
again, though I'd like to be working on v8.

Out of interest, what is the word on using newer versions of pg_dump on
older verisons of the DB - is it is possible or even wise to unload a 7.1
DB with the 7.4 version of pg?

Regards
iain



----- Original Message -----
From: "Bruce Momjian" <***@candle.pha.pa.us>
To: "Iain" <***@mst.co.jp>
Cc: <pgsql-***@postgresql.org>
Sent: Monday, November 01, 2004 11:36 AM
Subject: Re: [ADMIN] pg_restore TODO - delay PK creation
Post by Bruce Momjian
Post by Iain
Hi,
I'm wondering if this is already on some todo list for pg_restore but I
didn't find any mention of it anywhere, so I thought I should post this and
see what people think..
Basically, I'd like to see an option at restore time to not include the
primary key constraint when issuing the create table command. I'd like the
PK to be added after data has been loaded using an ALTER command.
The principle reason for this is performance.
There may also be a bug somewhere, or perhaps just a problem with my system,
but I was trying to restore a fairly large table (over 7000000 rows) which
would run for a couple hours before failing. Dropping the PK enabled the
load to complete in 3 or 4 minutes. Adding the PK took another 3 or 4
minutes which adds up to quite a difference.
I don't know what PostgreSQL version you have but we currently do what
---------------------------------------------------------------------------
--
-- Name: test; Type: TABLE; Schema: public; Owner: postgres
--
CREATE TABLE test (
x integer NOT NULL
);
ALTER TABLE public.test OWNER TO postgres;
--
-- Data for Name: test; Type: TABLE DATA; Schema: public; Owner: postgres
--
COPY test (x) FROM stdin;
1
\.
--
-- Name: test_pkey; Type: CONSTRAINT; Schema: public; Owner: postgres
--
ALTER TABLE ONLY test
ADD CONSTRAINT test_pkey PRIMARY KEY (x);
--
Bruce Momjian | http://candle.pha.pa.us
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster
Bruce Momjian
2004-11-01 03:25:24 UTC
Permalink
Post by Iain
Hi Bruce,
OK, I've just looked into it and you are right, thanks.
In the case I just tested I restored a 7.1 dump to a 7.4.6 db. I had assumed
that this is a restore time issue but in fact it is dependent on the format
of the dump file. I noticed the problem after the data load failed and I
checked the description of the table in question, it had a primary key.
Worth noting for anybody preparing to upgrade large databases from old
versions. This db takes 8 hours to restore on 7.1 but I just dumped it and
restored it in less than 20 minutes total on 7.4 :)
Quite a dramatic improvement, though 7.1 is in the ancient category.
Post by Iain
I havn't used 7.4 for dumping/restoring for a long time so I had forgotten
how it worked. We're just starting a redevelopment that will include an
upgrade of the production db so I'm looking forward to working with 7.4
again, though I'd like to be working on v8.
Out of interest, what is the word on using newer versions of pg_dump on
older verisons of the DB - is it is possible or even wise to unload a 7.1
DB with the 7.4 version of pg?
It should work. I see version checks for >=70100 in the code so you
should be fine using 7.4 or 8.0 pg_dump for 7.1.
--
Bruce Momjian | http://candle.pha.pa.us
***@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org
Tom Lane
2004-11-01 03:36:22 UTC
Permalink
Post by Iain
Out of interest, what is the word on using newer versions of pg_dump on
older verisons of the DB - is it is possible or even wise to unload a 7.1
DB with the 7.4 version of pg?
Standard, recommended procedure is to use the later version of pg_dump
to unload data from the old server. This gets you the benefit of any
bug fixes in the newer pg_dump (of which there are always some...)

Note that pg_dump is usually not designed to produce output that will
load into back-release servers, so this only applies to cross-version
upgrades. You can't use the 7.4 pg_dump against a 7.1 server and expect
to get a dump file that will load back into the 7.1 server --- more than
likely, the dump file will make use of features that weren't in 7.1.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings
Iain
2004-11-01 05:42:16 UTC
Permalink
Thanks guys, I'll keep that in mind.

Sorry to keep on about this, but please bear with me for one more m(_ _)m
<-- this is me abasing myself before you

IIRC there has been a bug fix to the COPY command, regarding handling
backslashed Ns or something like that sometime between 7.1 and 7.4, if I
dump the 7.1 db with the 7.4 version of pg_dump then do I get the fixes to
COPY?

The documentation for psql is clear that it uses a frontend version of copy
so I'm gonna assume for now (until I test it am told eitherway) that psql
7.4 will use the later version of copy against a 7.1 db.

Because the old db is badly in need of maintenance and some tables take a
considerable time to unload, I was considering running a few concurrent
sessions of psql and using the COPY command so I can start loading smaller
tables on the new server while the bigger ones are still unloading. Most of
the old tables have to be transformed to a new format, and I wanna get home
by the last train if possible :) .

regards
iain

----- Original Message -----
From: "Tom Lane" <***@sss.pgh.pa.us>
To: "Iain" <***@mst.co.jp>
Cc: "Bruce Momjian" <***@candle.pha.pa.us>; <pgsql-***@postgresql.org>
Sent: Monday, November 01, 2004 12:36 PM
Subject: Re: [ADMIN] pg_restore TODO - delay PK creation
Post by Tom Lane
Post by Iain
Out of interest, what is the word on using newer versions of pg_dump on
older verisons of the DB - is it is possible or even wise to unload a 7.1
DB with the 7.4 version of pg?
Standard, recommended procedure is to use the later version of pg_dump
to unload data from the old server. This gets you the benefit of any
bug fixes in the newer pg_dump (of which there are always some...)
Note that pg_dump is usually not designed to produce output that will
load into back-release servers, so this only applies to cross-version
upgrades. You can't use the 7.4 pg_dump against a 7.1 server and expect
to get a dump file that will load back into the 7.1 server --- more than
likely, the dump file will make use of features that weren't in 7.1.
regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ***@postgresql.org so that your
message can get through to the mailing list cleanly
Tom Lane
2004-11-01 05:56:24 UTC
Permalink
Post by Iain
IIRC there has been a bug fix to the COPY command, regarding handling
backslashed Ns or something like that sometime between 7.1 and 7.4, if I
dump the 7.1 db with the 7.4 version of pg_dump then do I get the fixes to
COPY?
My recollection is that some of those changes were on the backend side,
and so they would affect the COPY data that pg_dump transcribes to the
dump file. Using the newer pg_dump will not make you any worse off
AFAIR, but it's not a magic fix for server-side bugs either ...

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend
Iain
2004-11-01 06:35:21 UTC
Permalink
Hi Bruce and Tom,

I've got the picture now, thanks.

I'll be sure to test it all pretty thoroughly before the day anyway.

Thanks again.

Iain
----- Original Message -----
From: "Tom Lane" <***@sss.pgh.pa.us>
To: "Iain" <***@mst.co.jp>
Cc: "Bruce Momjian" <***@candle.pha.pa.us>; <pgsql-***@postgresql.org>
Sent: Monday, November 01, 2004 2:56 PM
Subject: Re: [ADMIN] pg_restore TODO - delay PK creation
Post by Tom Lane
Post by Iain
IIRC there has been a bug fix to the COPY command, regarding handling
backslashed Ns or something like that sometime between 7.1 and 7.4, if I
dump the 7.1 db with the 7.4 version of pg_dump then do I get the fixes to
COPY?
My recollection is that some of those changes were on the backend side,
and so they would affect the COPY data that pg_dump transcribes to the
dump file. Using the newer pg_dump will not make you any worse off
AFAIR, but it's not a magic fix for server-side bugs either ...
regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend
---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ***@postgresql.org
Loading...