slow seqscan after vacuum analize

Discussion:

slow seqscan after vacuum analize

(too old to reply)

Edoardo Ceccarelli

2004-02-04 21:23:42 UTC

I have a simple query that scans each record, like this: select * from utente where
luogorilasciodoc='ciao'
The execution time BEFORE vacuum is: 1203ms The execution time AFTER vacuum is: 6656ms
!!!
What is going on? Thought that one vaccum the db to get better performance!

PostgreSQL Ver. 7.3.4 on linux

Thank you
Edoardo

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ***@postgresql.org)

scott.marlowe

2004-02-04 21:45:01 UTC

Post by Edoardo Ceccarelli
I have a simple query that scans each record, like this: select * from utente where
luogorilasciodoc='ciao'
The execution time BEFORE vacuum is: 1203ms The execution time AFTER vacuum is: 6656ms
!!!
What is going on? Thought that one vaccum the db to get better performance!
PostgreSQL Ver. 7.3.4 on linux

Can you show us the output of "explain analyze select * from ..." for both
before and after a vacuum? i.e. with the difference in performance. If
not, just an explain analyze of the way it runs now might be enough.

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Edoardo Ceccarelli

2004-02-05 00:35:12 UTC

-----Messaggio originale-----
Inviato: mercoledì 4 febbraio 2004 22.45
A: Edoardo Ceccarelli
Oggetto: Re: [ADMIN] slow seqscan after vacuum analize
select * from

utente where luogorilasciodoc='ciao'
The execution time BEFORE vacuum is: 1203ms The execution

time AFTER

vacuum is: 6656ms !!!
What is going on? Thought that one vaccum the db to get

better performance!

PostgreSQL Ver. 7.3.4 on linux

Can you show us the output of "explain analyze select * from
..." for both
before and after a vacuum? i.e. with the difference in
performance. If
not, just an explain analyze of the way it runs now might be enough.

BEFORE (copy of the db without having run the vacuum):

explain analyze SELECT * FROM utente where luogorilasciodoc='ciao';
QUERY PLAN

--------------------------------------------------------------------------------------
----------------------
Seq Scan on utente (cost=0.00..40947.90 rows=1826 width=724) (actual
time=131.17..856.96 rows=15 loops=1)
Filter: (luogorilasciodoc = 'ciao'::bpchar)
Total runtime: 857.06 msec
(3 rows)

AFTER (actual db)
portaportese=# explain analyze SELECT * FROM utente where luogorilasciodoc='ciao';
QUERY PLAN

--------------------------------------------------------------------------------------
--------------------
Seq Scan on utente (cost=0.00..92174.50 rows=3 width=724) (actual
time=705.41..6458.19 rows=15 loops=1)
Filter: (luogorilasciodoc = 'ciao'::bpchar)
Total runtime: 6458.29 msec
(3 rows

Things are worst only for seqscan, when it uses indexscan timing is good.

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ***@postgresql.org so that your
message can get through to the mailing list cleanly

Sam Barnett-Cormack

2004-02-05 00:56:42 UTC

Post by Edoardo Ceccarelli
explain analyze SELECT * FROM utente where luogorilasciodoc='ciao';
QUERY PLAN
--------------------------------------------------------------------------------------
----------------------
Seq Scan on utente (cost=0.00..40947.90 rows=1826 width=724) (actual
time=131.17..856.96 rows=15 loops=1)
Filter: (luogorilasciodoc = 'ciao'::bpchar)
Total runtime: 857.06 msec
(3 rows)
AFTER (actual db)
portaportese=# explain analyze SELECT * FROM utente where luogorilasciodoc='ciao';
QUERY PLAN
--------------------------------------------------------------------------------------
--------------------
Seq Scan on utente (cost=0.00..92174.50 rows=3 width=724) (actual
time=705.41..6458.19 rows=15 loops=1)
Filter: (luogorilasciodoc = 'ciao'::bpchar)
Total runtime: 6458.29 msec
(3 rows
Things are worst only for seqscan, when it uses indexscan timing is good.

Only thing I can think of is if storage method had been changed. Not
sure if that would even affect it, or if it could do that by itself.
Just brainstorming.

--
Sam Barnett-Cormack
Software Developer | Student of Physics & Maths
UK Mirror Service (http://www.mirror.ac.uk) | Lancaster University

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Edoardo Ceccarelli

2004-02-05 01:06:53 UTC

Post by Edoardo Ceccarelli
----------------------------------------------------------------------

Post by Edoardo Ceccarelli
----------------
--------------------
Seq Scan on utente (cost=0.00..92174.50 rows=3 width=724) (actual
time=705.41..6458.19 rows=15 loops=1)
Filter: (luogorilasciodoc = 'ciao'::bpchar) Total

runtime: 6458.29

Post by Edoardo Ceccarelli
msec
(3 rows
Things are worst only for seqscan, when it uses indexscan

timing is good.
Only thing I can think of is if storage method had been
changed. Not sure if that would even affect it, or if it
could do that by itself.
Just brainstorming.

Do you know how can I check if the storage method has changed?
I was thinking that the priority target of a vacuum operation is to reclaim disk space
- this might imply that the performance are worst for a seqscan - maybe it's normal.
Anyway, I am doing a VACUUM FULL ANALYZE right now to see if things get better.

Thanks for you hints
Edoardo

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Edoardo Ceccarelli

2004-02-05 02:10:13 UTC

Post by Edoardo Ceccarelli

Post by Edoardo Ceccarelli
AFTER (actual db)
portaportese=# explain analyze SELECT * FROM utente where

luogorilasciodoc='ciao';

Post by Edoardo Ceccarelli
QUERY PLAN

----------------------------------------------------------------------

Post by Edoardo Ceccarelli
----------------
--------------------
Seq Scan on utente (cost=0.00..92174.50 rows=3 width=724) (actual
time=705.41..6458.19 rows=15 loops=1)
Filter: (luogorilasciodoc = 'ciao'::bpchar) Total

runtime: 6458.29

Post by Edoardo Ceccarelli
msec
(3 rows

After a VACUUM FULL ANALYZE:

portaportese=# explain analyze SELECT * FROM utente where luogorilasciodoc='ciao';
QUERY PLAN

--------------------------------------------------------------------------------------
--------------------
Seq Scan on utente (cost=0.00..41329.21 rows=154 width=724) (actual
time=91.61..751.28 rows=15 loops=1)
Filter: (luogorilasciodoc = 'ciao'::bpchar)
Total runtime: 751.35 msec

Things are better now! :) it surely was because i have used the normal vacuum...

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

David F. Skoll

2004-02-05 02:17:02 UTC

Post by Edoardo Ceccarelli
Things are worst only for seqscan, when it uses indexscan timing is good.

It might just be that running VACUUM flushed the operating system's
buffer cache.

I always try running two or three EXPLAIN ANALYZE's in a row for the
same query. Usually, the first one takes much longer than the rest.

Regards,

David.

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Edoardo Ceccarelli

2004-02-05 02:18:31 UTC

Yes, you are right but it wasn't the case this time, I have run the explain plenty of
times with same results. I think that the reason was that I made a simple VACUUM,
after a VACUUM FULL ANALYZE (1h!!) things are ok

Best Regards
Edoardo

-----Messaggio originale-----
Inviato: giovedì 5 febbraio 2004 3.17
A: Edoardo Ceccarelli
Oggetto: Re: R: [ADMIN] slow seqscan after vacuum analize

Post by Edoardo Ceccarelli
Things are worst only for seqscan, when it uses indexscan

timing is good.
It might just be that running VACUUM flushed the operating
system's buffer cache.
I always try running two or three EXPLAIN ANALYZE's in a row
for the same query. Usually, the first one takes much longer
than the rest.
Regards,
David.

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ***@postgresql.org)

Christopher Browne

2004-02-05 04:18:39 UTC

Post by Edoardo Ceccarelli
Yes, you are right but it wasn't the case this time, I have run the
explain plenty of times with same results. I think that the reason
was that I made a simple VACUUM, after a VACUUM FULL ANALYZE (1h!!)
things are ok

It sounds as though you weren't vacuuming ("just plain vacuum") often
enough.

What tends to go wrong is when the table winds up with a LOT of empty
space due to there being a lot of updates to the table without dead
tuples being cleaned out. The table winds up big, with no way to
shrink it without the cost of a VACUUM FULL.

If you vacuumed more often, the size of the table would likely stay
smaller which is sure to be helpful.

Another factor worth considering: If a few values are very common in
the field you are selecting on, then the query optimizer can get
convinced (wrongly) that a Seq Scan is the best choice. Using ALTER
TABLE T ALTER COLUMN C SET STATISTICS [some value] to increase the
number of "bins" can be helpful in such cases. (My pet theory is that
the present default value of 10 is a little low, and that a lot of
optimizer errors might be resolved by bumping it up a bit...)

--
(format nil "~S@~S" "cbbrowne" "ntlug.org")
http://www.ntlug.org/~cbbrowne/sgml.html
But what can you do with it? -- ubiquitous cry from Linux-user
partner. -- Andy Pearce, <***@hpopd.pwd.hp.com>

Tom Lane

2004-02-05 17:09:49 UTC

Post by Christopher Browne
Another factor worth considering: If a few values are very common in
the field you are selecting on, then the query optimizer can get
convinced (wrongly) that a Seq Scan is the best choice. Using ALTER
TABLE T ALTER COLUMN C SET STATISTICS [some value] to increase the
number of "bins" can be helpful in such cases. (My pet theory is that
the present default value of 10 is a little low, and that a lot of
optimizer errors might be resolved by bumping it up a bit...)

I also suspect that 10 is a lowball figure, but no one has done any work
to establish what might be a more reasonable default. Larger values
have real costs in both pg_statistic space and planner runtime, so I
don't want to push it up without some kind of evidence.

BTW, if you think a higher default might be appropriate, you can set it
in postgresql.conf instead of running around and manually ALTERing all
your tables.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ***@postgresql.org so that your
message can get through to the mailing list cleanly

Steve Lane

2004-02-06 03:48:35 UTC

Hello all:

Question here about how best to optimize a large data load. Data load is ad
hoc and so needs to be able to run during production hours.

We have a piece of software written in a desktop RAD environment (FileMaker
Pro). Our users are using this tool to generate data that we need to get
into postgresql. We're sending it through a middleware layer written in PHP.

A single data load will generate from 10K-100K rows.

Initially we did everything procedurally through PHP. 100K inserts, each one
called through several layers of PHP abstraction. Bad idea.

Current idea is to have PHP dump the data to a file, and suck the whole file
in at once somehow. So far, so good: PHP can create the file in 6 minutes
for 100K rows. That's actually acceptable.

Now we want to use COPY to bring the data in. The target table has 6
indexes. Without indexes, naturally, we can load 80K rows in 2 seconds. With
indexes, 46 seconds. (oddly, ONE index contributes 40+ seconds of that, yet
they're all similar, single-column indexes. Anyway, that's another story).

Normally, in a batch-like environment, I'd feel free to drop the indexes,
load, reindex. That's perfectly fast. But the environment needs to be live,
and those indexes are vital to a reporting engine that can be hit at any
time. So we can't just drop them, even briefly.

So I hit on the idea of doing the same thing, but inside a transaction. In
theory that should affect no one else. To my delight, the transaction
drop-copy-reindex ran in 7 seconds.

I guess I'm just wondering how that's possible. I hate to sound like a
superstitious goof, but it sounds to good to be true. At best, I figured to
pay the whole penalty at the time of COMMIT -- that it would be fast up to
that point, and then of course need to do exactly the same work as the
transactionless version, as far as reconciling indexes or whatever the more
accurately technical term is.

So: is this too good to be true? Or is this actually a good way to do this?

Any other observations on the whole process? Is there a better or different
approach, or other things we should consider?

Any and all thoughts are welcome.

-- sgl

=======================================================
Steve Lane

Vice President
The Moyer Group
14 North Peoria St Suite 2H
Chicago, IL 60607

Voice: (312) 433-2421 Email: ***@moyergroup.com
Fax: (312) 850-3930 Web: http://www.moyergroup.com
=======================================================

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Steve Lane

2004-02-06 04:36:52 UTC

All:

Been looking at speed issues related to loading large data sets, and
experimenting with dropping and rebuilding indexes. I noticed some
interesting behavior on the table in question. Here's the table:

Table "test_responses_2"
Attribute | Type | Modifier
-------------------+-------------------+----------
id_response | integer |
id_code | integer |
id_survey | integer |
id_topic | integer |
id_item | integer |
id_administration | integer |
response_int | smallint |
response_txt | character varying |
rec_handle | character varying |
Indices: id_administration_test_key,
id_code_test_key,
id_item_test_key,
id_response_test_key,
id_survey_test_key,
id_topic_test_key

When I drop and rebuild the indexes, they take oddly varying amounts of time
to rebuild. I rebuilt them in the following order, with the following rough
times. I took a guess that the speed of the rebuild might be related to the
number of distinct values in the column -- this seems true in some cases but
not in others. Here are the times:

id_response 38 secs (86000 distinct)
id_topic 33 secs (6 distinct)
id_survey 13 secs (1 distinct)
id_code 39 secs (1444 distinct)
id_item 40 secs (65 distinct)
id_administration 13 secs (1 distinct)

Is there anything I should be learning from this?

-- sgl

=======================================================
Steve Lane

Vice President
The Moyer Group
14 North Peoria St Suite 2H
Chicago, IL 60607

Voice: (312) 433-2421 Email: ***@moyergroup.com
Fax: (312) 850-3930 Web: http://www.moyergroup.com
=======================================================

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Tom Lane

2004-02-06 06:30:15 UTC

Post by Steve Lane
When I drop and rebuild the indexes, they take oddly varying amounts of time
to rebuild. I rebuilt them in the following order, with the following rough
times. I took a guess that the speed of the rebuild might be related to the
number of distinct values in the column -- this seems true in some cases but
id_response 38 secs (86000 distinct)
id_topic 33 secs (6 distinct)
id_survey 13 secs (1 distinct)
id_code 39 secs (1444 distinct)
id_item 40 secs (65 distinct)
id_administration 13 secs (1 distinct)

How many rows altogether in this table? What sort_mem are you using
(and have you tried altering it)?

When I read your previous message I was wondering about different
datatypes having different comparison costs, but since these are all
integer columns that's clearly not the issue here. I think you may
be seeing something related to the number of initial runs created in the
sorting algorithm --- with only one distinct value, there'd always be
just one run and no need for any merge passes.

(Why are you bothering to index columns with only one or a few values in
them, anyway? Such an index will be useless for searches ...)

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ***@postgresql.org so that your
message can get through to the mailing list cleanly

Steve Lane

2004-02-06 07:03:48 UTC

Post by Tom Lane

Post by Steve Lane
When I drop and rebuild the indexes, they take oddly varying amounts of time
to rebuild. I rebuilt them in the following order, with the following rough
times. I took a guess that the speed of the rebuild might be related to the
number of distinct values in the column -- this seems true in some cases but
id_response 38 secs (86000 distinct)
id_topic 33 secs (6 distinct)
id_survey 13 secs (1 distinct)
id_code 39 secs (1444 distinct)
id_item 40 secs (65 distinct)
id_administration 13 secs (1 distinct)

How many rows altogether in this table? What sort_mem are you using
(and have you tried altering it)?

Sorry, left out important info. I cleaned the table first (it's a temporary
copy), then imported one run of data, 86,000 rows.

I haven't checked sort_mem, will do so.

Post by Tom Lane
When I read your previous message I was wondering about different
datatypes having different comparison costs, but since these are all
integer columns that's clearly not the issue here. I think you may
be seeing something related to the number of initial runs created in the
sorting algorithm --- with only one distinct value, there'd always be
just one run and no need for any merge passes.
(Why are you bothering to index columns with only one or a few values in
them, anyway? Such an index will be useless for searches ...)

Again, I left out some useful details. Id_survey and id_administration will
be identical for a single run of data (which can generate 10-100K inserts).
And my tests were just one data set, imported into a cleaned table. Still,
the distinctness of these columns will be low -- on the order of one
distinct value per 10^4 rows. The others have levels of distinctness
proportional to what the above chart shows -- id_response is unique,
id_topic will have 4-8 values per 10^4 records, and so on.

And still, none of this explains to me why the index on id_item drove the
COPY cost up so apparently dramatically. I tried the COPY again both with
and without that one index and in both cases it caused COPY to go from 5 to
40+ seconds.

-- sgl

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Tom Lane

2004-02-06 14:50:41 UTC

Post by Steve Lane
And still, none of this explains to me why the index on id_item drove the
COPY cost up so apparently dramatically. I tried the COPY again both with
and without that one index and in both cases it caused COPY to go from 5 to
40+ seconds.

That doesn't make a lot of sense to me either, particularly seeing that
id_item seems to be an intermediate case as far as uniqueness goes.
Do you still get 30+ seconds to COPY if that is the *only* index?

If the data you're using isn't too sensitive, would you mind sending me
a copy (off-list)? I'm interested to try profiling this behavior.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Tom Lane

2004-02-06 06:23:01 UTC

Post by Steve Lane
Now we want to use COPY to bring the data in. The target table has 6
indexes. Without indexes, naturally, we can load 80K rows in 2 seconds. With
indexes, 46 seconds. (oddly, ONE index contributes 40+ seconds of that, yet
they're all similar, single-column indexes. Anyway, that's another story).

No, I'd say that's the key part of the story. Details?

Post by Steve Lane
So I hit on the idea of doing the same thing, but inside a transaction. In
theory that should affect no one else.

... other than locking them out of the table while the transaction runs.
That doesn't sound like what you want to do. In any case, reindexing
the table will get slower and slower as the pre-existing data in the
table expands.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ***@postgresql.org)

Steve Lane

2004-02-06 06:54:01 UTC

Post by Tom Lane

Post by Steve Lane
Now we want to use COPY to bring the data in. The target table has 6
indexes. Without indexes, naturally, we can load 80K rows in 2 seconds. With
indexes, 46 seconds. (oddly, ONE index contributes 40+ seconds of that, yet
they're all similar, single-column indexes. Anyway, that's another story).

No, I'd say that's the key part of the story. Details?

I have another post out there called "index re-creation speed" that delves
into this. Here's the table:

Table "test_responses_2"
Attribute | Type | Modifier
-------------------+-------------------+----------
id_response | integer |
id_code | integer |
id_survey | integer |
id_topic | integer |
id_item | integer |
id_administration | integer |
response_int | smallint |
response_txt | character varying |
rec_handle | character varying |
Indices: id_administration_test_key,
id_code_test_key,
id_item_test_key,
id_response_test_key,
id_survey_test_key,
id_topic_test_key

Id_item_test_key is the one that drives up the speed of the COPY
dramatically. 65 distinct values in that column for the given data set.

Post by Tom Lane

Post by Steve Lane
So I hit on the idea of doing the same thing, but inside a transaction. In
theory that should affect no one else.

... other than locking them out of the table while the transaction runs.

Ha! I knew that that tasty-looking lunch had to have a price tag.

Post by Tom Lane
That doesn't sound like what you want to do. In any case, reindexing
the table will get slower and slower as the pre-existing data in the
table expands.

Yes, I've been running some tests and it stops being very acceptable around
3-5M rows.

So am I more or less out of luck? The issue, again, is how to insert at most
80-100K rows into a running system, at a time determined by users, into a
fairly heavily-indexed table.

I thought of dropping the indexes and deferring index recreation -- maybe
even having a very frequent cron job rebuild the indexes. All that buys me
is being able to return control to the initiating user quickly. It still has
the issue of expanding reindex time and, I'm guessing, the issue of locking
out other users as well.

Am I simply asking too much of my tools here?

-- sgl

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ***@postgresql.org

Iain

2004-02-05 06:32:29 UTC

Post by Edoardo Ceccarelli
Yes, you are right but it wasn't the case this time, I have run the explain plenty of
times with same results. I think that the reason was that I made a simple VACUUM,
after a VACUUM FULL ANALYZE (1h!!) things are ok

It's reasonable to expect that a seq scan will perform faster after a full
vacuum, as the physical size and organization of the table has been changed.

I wouldn't expact a plain vacuum to have any tangible affect on performance,
for the better or for the worse..

I'd like to know more about the possibility of plain vacuums harming
performance. This is the first I've heard of it. Vacuum full is not always
an option in a production environment.

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Christopher Browne

2004-02-05 13:15:24 UTC

Post by Iain
I'd like to know more about the possibility of plain vacuums harming
performance. This is the first I've heard of it. Vacuum full is not always
an option in a production environment.

There certainly are known cases where systems where the I/O bus is
already fairly much saturated will suffer BADLY when a big vacuum is
thrown at them.

The problem in such cases is that the vacuum draws the pages that it
is working on into the buffer cache, pushing out data that is actually
useful to cache.

There are experimental patches for 7.4, 7.3, and even, I believe, 7.2,
for a "sleeping vacuum" that tries to limit the damage by sleeping
every so often so that the vacuum does not dominate, and so that
"ordinary traffic" gets a chance to reclaim cache. And there are
efforts in 7.5 to improve cache management, so that pages brought in
by VACUUM would be put at the opposite end of the "LRU" queue. That
way, instead of them being treated as Most Recently Used, pushing
everything the least bit older towards being dropped from the buffer
cache, the vacuumed pages would be treated as if they were LRU, so
they would get evicted FIRST.

But if the Original Poster is encountering that the database is doing
Seq Scans when it would be better to do an Index Scan, that is a
separate problem, and focusing on the VACUUM may distract from the
_real_ problem...

--
let name="cbbrowne" and tld="cbbrowne.com" in String.concat "@" [name;tld];;
http://www.ntlug.org/~cbbrowne/advocacy.html
Rules of the Evil Overlord #195. "I will not use hostages as bait in a
trap. Unless you're going to use them for negotiation or as human
shields, there's no point in taking them."
<http://www.eviloverlord.com/>

Edoardo Ceccarelli

2004-02-05 17:00:52 UTC

Post by Christopher Browne
But if the Original Poster is encountering that the database
is doing Seq Scans when it would be better to do an Index
Scan, that is a separate problem, and focusing on the VACUUM
may distract from the _real_ problem...
--

I have only noticed that after a VACUUM ANALYZE of the db the time of a seq scan (when
a seqscan is required) has increased by a factor of ten, this is my primary focus,
otherwise I would have posted something about "my select is really slow ..."

To be more precise:
Given a query that HAS to be executed with a seqscan I have noticed an increase in
time comparing before and after the vacuum.

Anyway I am working to create the same situation again to post some output of the
verbose option of the vacuum.

Edoardo

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Tom Lane

2004-02-05 19:33:45 UTC

Post by Edoardo Ceccarelli
Given a query that HAS to be executed with a seqscan I have noticed an
increase in time comparing before and after the vacuum.

This is really hard to believe --- I cannot think of any mechanism that
would result in that effect. Unless the vacuum were flushing the
kernel's disk cache, but the effects of that would only persist for one
scan. You did say that the increased time is repeatable if you do
multiple seqscans after the vacuum?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Edoardo

2004-02-06 00:12:56 UTC

Post by Tom Lane

Post by Edoardo Ceccarelli
Given a query that HAS to be executed with a seqscan I have noticed an
increase in time comparing before and after the vacuum.

This is really hard to believe --- I cannot think of any mechanism that
would result in that effect. Unless the vacuum were flushing the
kernel's disk cache, but the effects of that would only persist for one
scan. You did say that the increased time is repeatable if you do
multiple seqscans after the vacuum?
regards, tom lane

Yes, I can assure you that was repeatable and has disappeared only after a
VACUUM FULL ANALYZE

it was something really stable in it's bad behaviour. I am going to make some
test soon and I will post here the results.

Best Regards
Edoardo

--
The net spotter (http://expot.it)
Open WebMail Project (http://openwebmail.org)

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Andrew Biagioni

2004-02-05 16:23:34 UTC

Post by Edoardo Ceccarelli

Post by Edoardo Ceccarelli
Yes, you are right but it wasn't the case this time, I have run the

explain plenty of

Post by Edoardo Ceccarelli
times with same results. I think that the reason was that I made a simple

VACUUM,

Post by Edoardo Ceccarelli
after a VACUUM FULL ANALYZE (1h!!) things are ok

It's reasonable to expect that a seq scan will perform faster after a full
vacuum, as the physical size and organization of the table has been changed.
I wouldn't expact a plain vacuum to have any tangible affect on performance,
for the better or for the worse..

A plain VACUUM (without ANALYZE) will change the layout of the data
without refreshing the optimizer information, so that anything that DOES
use the optimizer will often be negatively affected; VACUUM ANALYZE
does a vacuum AND refreshes the optimizer information.

I usually run VACUUM ANALYZE on an hourly basis on our production
system, and it's fast enough and unobtrusive enough that I can't tell
that it's running (except by looking at the log, of course).

Post by Edoardo Ceccarelli
I'd like to know more about the possibility of plain vacuums harming
performance. This is the first I've heard of it. Vacuum full is not always
an option in a production environment.
---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Stefan Sturm

2004-02-05 17:08:07 UTC

Hello,

I have a question about vaccum. There are thre Version of vaccums:
1.) plain vacuum
2.) vacuum analyze
3.) full vacuum

Is it right that a vacuum analyze includes the plain vacuum and the full
vacuum includes the vacuum analyze? And which vacuum locks the tables?

And is there a strategy when to use which vacuum? I think I should do a full
vacuum every night and a vacuum analyze sometimes a day.

Is this OK?

Greetings,
Stefan Sturm

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ***@postgresql.org)

Sam Barnett-Cormack

2004-02-05 17:22:32 UTC

Post by Stefan Sturm
Hello,
1.) plain vacuum
2.) vacuum analyze
3.) full vacuum
Is it right that a vacuum analyze includes the plain vacuum and the full
vacuum includes the vacuum analyze? And which vacuum locks the tables?

'full' is an option to vacuum, essentially, as is 'analyze', so there
are four variants (ignoring the freeze option):

vacuum
vacuum analyze
vacuum full
vacuum full analyze

It's all pretty obvious from there.

--
Sam Barnett-Cormack
Software Developer | Student of Physics & Maths
UK Mirror Service (http://www.mirror.ac.uk) | Lancaster University

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Sam Barnett-Cormack

2004-02-05 15:00:27 UTC

Post by Edoardo Ceccarelli
after a VACUUM FULL ANALYZE (1h!!) things are ok

Only a hour - lucky you ;)

--
Sam Barnett-Cormack
Software Developer | Student of Physics & Maths
UK Mirror Service (http://www.mirror.ac.uk) | Lancaster University

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Edoardo Ceccarelli

2004-02-05 18:50:31 UTC

Well, maybe it was more, can't remember, it was at 3am! :)

-----Messaggio originale-----
Barnett-Cormack
Inviato: giovedì 5 febbraio 2004 16.00
A: Edoardo Ceccarelli
Oggetto: Re: R: R: [ADMIN] slow seqscan after vacuum analize

Post by Edoardo Ceccarelli
after a VACUUM FULL ANALYZE (1h!!) things are ok

Only a hour - lucky you ;)
--
Sam Barnett-Cormack
Software Developer | Student of
Physics & Maths
UK Mirror Service (http://www.mirror.ac.uk) | Lancaster University
---------------------------(end of
broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?
http://www.postgresql.org/docs/faqs/FAQ.html

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

26 Replies
1 View
Permalink to this page
Disable enhanced parsing

Thread Navigation

Edoardo Ceccarelli 2004-02-04 21:23:42 UTC

scott.marlowe 2004-02-04 21:45:01 UTC

Edoardo Ceccarelli 2004-02-05 00:35:12 UTC

Sam Barnett-Cormack 2004-02-05 00:56:42 UTC

Edoardo Ceccarelli 2004-02-05 01:06:53 UTC

Edoardo Ceccarelli 2004-02-05 02:10:13 UTC

David F. Skoll 2004-02-05 02:17:02 UTC

Edoardo Ceccarelli 2004-02-05 02:18:31 UTC

Christopher Browne 2004-02-05 04:18:39 UTC

Tom Lane 2004-02-05 17:09:49 UTC

Steve Lane 2004-02-06 03:48:35 UTC

Steve Lane 2004-02-06 04:36:52 UTC

Tom Lane 2004-02-06 06:30:15 UTC

Steve Lane 2004-02-06 07:03:48 UTC

Tom Lane 2004-02-06 14:50:41 UTC

Tom Lane 2004-02-06 06:23:01 UTC

Steve Lane 2004-02-06 06:54:01 UTC

Iain 2004-02-05 06:32:29 UTC

Christopher Browne 2004-02-05 13:15:24 UTC

Edoardo Ceccarelli 2004-02-05 17:00:52 UTC

Tom Lane 2004-02-05 19:33:45 UTC

Edoardo 2004-02-06 00:12:56 UTC

Andrew Biagioni 2004-02-05 16:23:34 UTC

Stefan Sturm 2004-02-05 17:08:07 UTC

Sam Barnett-Cormack 2004-02-05 17:22:32 UTC

Sam Barnett-Cormack 2004-02-05 15:00:27 UTC

Edoardo Ceccarelli 2004-02-05 18:50:31 UTC

about - legalese

Loading...