Large transaction problem

Discussion:

(too old to reply)

Paul B. Anderson

2004-11-08 22:58:12 UTC

I'm trying to move data from an Oracle database into postgresql 7.4.3.
I have a program that is extracting data and writing it to an sql script
as a long series of inserts (20,000 at a time). When I run these
scripts using psql, I occassionally get a problem that is reported as

ERROR: canceling query due to user request

And then it won't clear. If I quit psql and start it again, every sql
statement on that database gives this error after 30 seconds (approx) of
hanging. The only way to clear this is to restart the postgresql
backend and then try vacuuming a couple of times. It eventually clears
in this manner.

What am I doing wrong? I increased the WAL segments from 3 to 30 but
this did not get rid of the problem. Is there a limit on the size of a
transaction?

Thanks.

Paul

Tom Lane

2004-11-08 23:19:21 UTC

Permalink

Post by Paul B. Anderson
I'm trying to move data from an Oracle database into postgresql 7.4.3.
I have a program that is extracting data and writing it to an sql script
as a long series of inserts (20,000 at a time). When I run these
scripts using psql, I occassionally get a problem that is reported as
ERROR: canceling query due to user request
And then it won't clear. If I quit psql and start it again, every sql
statement on that database gives this error after 30 seconds (approx) of
hanging.

It sounds to me like you are starting the postmaster under finite ulimit
settings --- could be either CPU or filesize limits.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Paul B. Anderson

2004-11-11 19:50:26 UTC

Permalink

I think I've solved this problem. It involved the vacuumdb command
rather than the large insert. I found this out when a nightly vacuumdb
script reported the problem even though no records had been loaded at all.

My postgresql.conf file had a 10 second timeout and the large database
required more than 10 seconds for the vacuum. It seems that this left
postmaster and/or the particular database in a state where any SQL
against that database gave the same error response about being canceled
by the user.

I'm guessing postgresql had left some internal state set for the
database and was trying to complete the vacuum first. Anyway, I set
the statement_timeout = 0 and the problem isn't recurring.

Paul

Ulimit says
# su - postgres
-bash-2.05b$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
file size (blocks, -f) unlimited
max locked memory (kbytes, -l) 4
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 3072
virtual memory (kbytes, -v) unlimited
None of the actual limits seems to be a problem. I can't seem to change
max locked memory using ulimit but I'll try increasing stack size.
Could that be involved?

No, the 'file size' and 'cpu time' limits are the ones I'd expect to
cause this sort of behavior.

One possible gotcha is that depending on how you start the postmaster,
it might be living under different ulimit settings than what you see
interactively.

regards, tom lane

Tom Lane

2004-11-11 21:07:35 UTC

Permalink

Post by Paul B. Anderson
My postgresql.conf file had a 10 second timeout and the large database
required more than 10 seconds for the vacuum. It seems that this left
postmaster and/or the particular database in a state where any SQL
against that database gave the same error response about being canceled
by the user.

Hmm, I couldn't duplicate this. I thought maybe the vacuum wasn't
releasing some lock after it failed, but there's no sign of such a
problem.

Are you sure it isn't just that all your queries were running into the
timeout?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Paul B. Anderson

2004-11-11 22:14:50 UTC

Permalink

After I restart postgresql, I execute vacuum several times in sql until
it succeeds. Then, everything else works OK again.

After I clear the problem,

select count(*) from archive

takes a second but, before the problem is cleared, it takes about 30
seconds and then gives the canceled by user response.

The vacuum was run using the vacuumdb command rather than from psql. It
was in a cron script running under user postgres. There was a .pgpass
file. The command was

/usr/bin/vacuumdb --all --full --analyze

This is postgresql on Red Hat Enterprise Linux 3 (ES) from RPMs
postgresql-7.4.3-2PGDG, etc.

Thanks.

Paul

Post by Tom Lane

Tom Lane

2004-11-11 22:48:52 UTC

Permalink

Post by Paul B. Anderson
After I restart postgresql, I execute vacuum several times in sql until
it succeeds. Then, everything else works OK again.
After I clear the problem,
select count(*) from archive
takes a second but, before the problem is cleared, it takes about 30
seconds and then gives the canceled by user response.

This all sounds exactly like an overly enthusiastic statement_timeout.
What you are doing to "clear the problem" is merely to populate the disk
cache sufficiently that your query finishes before timing out.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Andrew Sullivan

2004-11-12 14:33:49 UTC

Permalink

Post by Paul B. Anderson
/usr/bin/vacuumdb --all --full --analyze

^^^^^

You do realise that takes an exclusive lock, right? Why are you
doing VACUUM FULL? If your FSM is set right, you really shouldn't
need it for most cases.

A

--
Andrew Sullivan | ***@crankycanuck.ca
The plural of anecdote is not data.
--Roger Brinner

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ***@postgresql.org)

Paul B. Anderson

2004-11-12 15:30:27 UTC

Permalink

When the vacuum full timed out, could the exclusive lock have been left
set, leaving the database in an unusable state?

BTW, thanks. I removed the --full option.

Paul

Post by Andrew Sullivan

Post by Paul B. Anderson
/usr/bin/vacuumdb --all --full --analyze

^^^^^
You do realise that takes an exclusive lock, right? Why are you
doing VACUUM FULL? If your FSM is set right, you really shouldn't
need it for most cases.
A

Andrew Sullivan

2004-11-12 15:57:28 UTC

Permalink

Post by Paul B. Anderson
When the vacuum full timed out, could the exclusive lock have been left
set, leaving the database in an unusable state?

No, but cancelling a VACUUM FULL takes a long time, in my experience,
and so the lock hangs around for quite a while.

Post by Paul B. Anderson
BTW, thanks. I removed the --full option.

My bet is that will help matters.

A
--
Andrew Sullivan | ***@crankycanuck.ca
In the future this spectacle of the middle classes shocking the avant-
garde will probably become the textbook definition of Postmodernism.
--Brad Holland

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Vida Luz

2004-11-12 16:03:36 UTC

Permalink

Hi all

I have a table in y database that have 8,000,000 of rows, when I execut a
query on this table, the answuer is very slow.

I have a index in this table by datem, my table is

Column | Type | Modifiers
---------+---------------------------+-----------
nombre | character varying(15) |
mensaje | character varying(250) |
nombre_env | character varying(100) |
cel_env | character varying(15) |
fecha | date |
hora | time(0) without time zone |
Indexes: correo_fecha

Myindex is:
Index "correo_fecha"
Column | Type
--------+------
fecha | date
btree

My query is

select EXTRACT(MONTH FROM M.fecha) as ho,count(*) as tot from correo M
where EXTRACT(YEAR FROM M.fecha)='2004' group by EXTRACT(MONTH FROM
M.fecha);

When I executed a EXPLAIN ANALIZE, I hace the following Answer

Aggregate (cost=122439.31..122558.36 rows=2381 width=4) (actual
time=64626.46..76021.93 rows=11 loops=1)
-> Group (cost=122439.31..122498.84 rows=23809 width=4) (actual
time=63951.10..73332.27 rows=4177209 loops=1)
-> Sort (cost=122439.31..122439.31 rows=23809 width=4) (actual
time=63951.09..67240.94 rows=4177209 loops=1)
-> Seq Scan on web_sms m (cost=0.00..120708.48 rows=23809
width=4) (actual time=0.30..55077.31 rows=4177209 loops=1)
Total runtime: 76069.22 msec

How can I do to acceletate the answer?

Thanks.

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Jaime Casanova

2004-11-12 16:33:12 UTC

Permalink

Post by Vida Luz
Hi all
I have a table in y database that have 8,000,000 of
rows, when I execut a
query on this table, the answuer is very slow.
I have a index in this table by datem, my table is
Column | Type | Modifiers
---------+---------------------------+-----------
nombre | character varying(15) |
mensaje | character varying(250) |
nombre_env | character varying(100) |
cel_env | character varying(15) |
fecha | date |
hora | time(0) without time zone |
Indexes: correo_fecha
Index "correo_fecha"
Column | Type
--------+------
fecha | date
btree
My query is
select EXTRACT(MONTH FROM M.fecha) as ho,count(*) as
tot from correo M
where EXTRACT(YEAR FROM M.fecha)='2004' group by
EXTRACT(MONTH FROM
M.fecha);
When I executed a EXPLAIN ANALIZE, I hace the
following Answer
Aggregate (cost=122439.31..122558.36 rows=2381
width=4) (actual
time=64626.46..76021.93 rows=11 loops=1)
-> Group (cost=122439.31..122498.84 rows=23809
width=4) (actual
time=63951.10..73332.27 rows=4177209 loops=1)
-> Sort (cost=122439.31..122439.31
rows=23809 width=4) (actual
time=63951.09..67240.94 rows=4177209 loops=1)
-> Seq Scan on web_sms m
(cost=0.00..120708.48 rows=23809
width=4) (actual time=0.30..55077.31 rows=4177209
loops=1)
Total runtime: 76069.22 msec
How can I do to acceletate the answer?
Thanks.

Maybe this question should be done at the PERFORMANCE
list.

What about creating the index on the extract
expresion?
CREATE INDEX tuIndice ON correo (date_part('year',
fecha));

I found the other way i suggested in the spanish list
is not correct (sintax) but this way it works (i have
proved it).

regards,
Jaime Casanova

_________________________________________________________
Do You Yahoo!?
Información de Estados Unidos y América Latina, en Yahoo! Noticias.
Visítanos en http://noticias.espanol.yahoo.com

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Joshua D. Drake

2004-11-12 17:09:36 UTC

Permalink

Post by Vida Luz
My query is
select EXTRACT(MONTH FROM M.fecha) as ho,count(*) as tot from correo M
where EXTRACT(YEAR FROM M.fecha)='2004' group by EXTRACT(MONTH FROM
M.fecha);
When I executed a EXPLAIN ANALIZE, I hace the following Answer
Aggregate (cost=122439.31..122558.36 rows=2381 width=4) (actual
time=64626.46..76021.93 rows=11 loops=1)
-> Group (cost=122439.31..122498.84 rows=23809 width=4) (actual
time=63951.10..73332.27 rows=4177209 loops=1)
-> Sort (cost=122439.31..122439.31 rows=23809 width=4) (actual
time=63951.09..67240.94 rows=4177209 loops=1)
-> Seq Scan on web_sms m (cost=0.00..120708.48 rows=23809
width=4) (actual time=0.30..55077.31 rows=4177209 loops=1)
Total runtime: 76069.22 msec
How can I do to acceletate the answer?

You could use date_part and create a funtional index.

Sincerely,

Joshua D. Drake

Post by Vida Luz
Thanks.
---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

--
Command Prompt, Inc., home of PostgreSQL Replication, and plPHP.
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - ***@commandprompt.com - http://www.commandprompt.com
Mammoth PostgreSQL Replicator. Integrated Replication for PostgreSQL