VARCHAR -vs- CHAR: huge performance difference?

Discussion:

VARCHAR -vs- CHAR: huge performance difference?

(too old to reply)

C. Bensend

2004-06-16 00:58:02 UTC

Hey folks,

I am working on a rather small, simple database. I'm running 7.3.5 on
an OpenBSD 3.5-STABLE machine (1.3GHz, 512M RAM, IDE drive using UDMA-5).

I am parsing and storing emails, so I have a lot of character string
data to worry about. In one particular table, I have 26 columns of type
varchar, with widths of 24-8192 characters. This is the table that is
giving me some pretty severe performance problems.

For example, many of the statistics I run against these emails is a
simple count(), whether it be on a column or *. Right now, counting the
rows takes in excess of 3.5 seconds. OK, that's pretty slow. However,
there are only --> 24,000 <-- rows in this table! I could see it taking
a long time if I had 500,000 rows, or a million, but 24,000?

Now, I am not an expert with database design whatsoever, so I fully
admit that having a table with 26 varchar columns (along with a handful
of other fixed-width columns) probably isn't the brightest design. :)
So, I went ahead and created an exact copy of this table, with the
exception of creating all character columns as type char(), not varchar().
I was pondering if making PostgreSQL worry about the varying lengths
by using varchar was the problem...

And sure enough, counting the rows on the new table takes around
148ms. That's a pretty big difference from 3600ms. And no, 3.6 seconds
doesn't sound like much, until you have several operations on the same
table to draw a single web page, and suddenly the page takes 20s to
load. :(

So, my question for the gurus: is using varchars THAT detrimental?
Or am I just going way way overboard by having so _many_ in a single
table? Or am I an idiot? (honest question - I'm armpit-deep in learning
mode here) I'm also curious to know if I've crossed some invisible
line with the number of columns/width of rows that makes performance
degrade rapidly.

If further info is needed, please ask - I just didn't want to spam
the list with further table definitions and explain analyze output if
it wasn't needed.

Benny

--
"Oh, the Jedis are going to feel this one!" -- Professor Farnsworth,
"Futurama"

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Scott Marlowe

2004-06-16 01:23:06 UTC

Post by C. Bensend
Hey folks,
I am working on a rather small, simple database. I'm running 7.3.5 on
an OpenBSD 3.5-STABLE machine (1.3GHz, 512M RAM, IDE drive using UDMA-5).
I am parsing and storing emails, so I have a lot of character string
data to worry about. In one particular table, I have 26 columns of type
varchar, with widths of 24-8192 characters. This is the table that is
giving me some pretty severe performance problems.
For example, many of the statistics I run against these emails is a
simple count(), whether it be on a column or *. Right now, counting the
rows takes in excess of 3.5 seconds. OK, that's pretty slow. However,
there are only --> 24,000 <-- rows in this table! I could see it taking
a long time if I had 500,000 rows, or a million, but 24,000?
Now, I am not an expert with database design whatsoever, so I fully
admit that having a table with 26 varchar columns (along with a handful
of other fixed-width columns) probably isn't the brightest design. :)
So, I went ahead and created an exact copy of this table, with the
exception of creating all character columns as type char(), not varchar().
I was pondering if making PostgreSQL worry about the varying lengths
by using varchar was the problem...
And sure enough, counting the rows on the new table takes around
148ms. That's a pretty big difference from 3600ms. And no, 3.6 seconds
doesn't sound like much, until you have several operations on the same
table to draw a single web page, and suddenly the page takes 20s to
load. :(

Were those fields populated just like the varchar fields? If not, then
the test proves little. If so, I find it hard to believe that char(x)
would be any faster than varchar. They're all handled about the same.

If you want to do count(*) on the table, do it by having a table with
nothing but IDs in it that is referenced by the table with all the
text. PostgreSQL can't really optimized aggregate functions with
indexes, so it always winds up doing seq scans.

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ***@postgresql.org)

C. Bensend

2004-06-16 01:31:56 UTC

Post by Scott Marlowe
Were those fields populated just like the varchar fields? If not, then
the test proves little. If so, I find it hard to believe that char(x)
would be any faster than varchar. They're all handled about the same.

Hi Scott,

Yes, the new table was populated from the data from the original, via
a:

INSERT INTO emails2 SELECT * FROM emails;

This should be correct, yes?

Post by Scott Marlowe
If you want to do count(*) on the table, do it by having a table with
nothing but IDs in it that is referenced by the table with all the
text. PostgreSQL can't really optimized aggregate functions with
indexes, so it always winds up doing seq scans.

I have also tried doing a count(column) on the emails table, using a
column that is indexed. It showed no improvement - I would have
expected at least a little gain. And here's the clincher - when I do
a count(*) on a different table with the same number of rows but only
four varchar columns, it returns the result in 75ms. (!!!)

Benny

--
"Oh, the Jedis are going to feel this one!" -- Professor Farnsworth,
"Futurama"

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

C. Bensend

2004-06-16 02:12:29 UTC

Post by C. Bensend
So, I went ahead and created an exact copy of this table, with the
exception of creating all character columns as type char(), not varchar().
I was pondering if making PostgreSQL worry about the varying lengths
by using varchar was the problem...

The above transformation is a guaranteed loser in Postgres.

Hi Tom,

By transformation, do you mean the varchar() -> char() change? If
so, I'm not sure I understand - it certainly improved the performance.
Or am I misunderstanding?

I'm betting that the original table is physically huge because you've
not vacuumed it regularly. The copying produced a table with no wasted
space, so physically smaller even though the live data is noticeably
bigger (because of all the padding blanks you forced to be added).
Check what VACUUM VERBOSE has to say about each of these tables...

Actually, all databases on this server are vacuumed nightly, right
before backups. But here is the data:

prod01=> vacuum verbose emails;
INFO: --Relation public.emails--
INFO: Index emails_email_id_idx: Pages 358; Tuples 24198: Deleted 82.
CPU 0.03s/0.01u sec elapsed 0.41 sec.
INFO: Index emails_date_received_idx: Pages 325; Tuples 24198: Deleted 82.
CPU 0.00s/0.00u sec elapsed 0.63 sec.
INFO: Removed 82 tuples in 23 pages.
CPU 0.00s/0.00u sec elapsed 0.06 sec.
INFO: Pages 5793: Changed 0, Empty 0; Tup 24198: Vac 82, Keep 0, UnUsed
71757.
Total CPU 0.24s/0.06u sec elapsed 4.71 sec.
INFO: --Relation pg_toast.pg_toast_399420--
INFO: Pages 0: Changed 0, Empty 0; Tup 0: Vac 0, Keep 0, UnUsed 0.
Total CPU 0.00s/0.00u sec elapsed 0.00 sec.
VACUUM
prod01=> vacuum verbose emails2;
INFO: --Relation public.emails2--
INFO: Pages 2646: Changed 0, Empty 0; Tup 24162: Vac 0, Keep 0, UnUsed 0.
Total CPU 0.10s/0.00u sec elapsed 1.00 sec.
INFO: --Relation pg_toast.pg_toast_859969--
INFO: Pages 0: Changed 0, Empty 0; Tup 0: Vac 0, Keep 0, UnUsed 0.
Total CPU 0.00s/0.00u sec elapsed 0.00 sec.
VACUUM

Thanks very much,

Benny

--
"Oh, the Jedis are going to feel this one!" -- Professor Farnsworth,
"Futurama"

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Tom Lane

2004-06-16 02:51:52 UTC

Post by C. Bensend

The above transformation is a guaranteed loser in Postgres.

By transformation, do you mean the varchar() -> char() change?

Right. The only thing that will do for you is waste space with padding
blanks. The performance improvement you saw was due to something else,
which I theorize is not having to waste time reading dead space.

Since you obviously doubt this assertion, try the same experiment except
load the data into a new table with the *same* structure as the original
(ie, all varchar). You'll see the same or probably better performance.

Post by C. Bensend
Actually, all databases on this server are vacuumed nightly, right
before backups.

Not often enough evidently...

Post by C. Bensend
INFO: --Relation public.emails--
INFO: Pages 5793: Changed 0, Empty 0; Tup 24198: Vac 82, Keep 0, UnUsed
71757.
INFO: --Relation public.emails2--
INFO: Pages 2646: Changed 0, Empty 0; Tup 24162: Vac 0, Keep 0, UnUsed 0.

Note the difference in "pages". emails is nearly double the physical
size, even though the live data in it is doubtless smaller. (You could
use the contrib/pgstattuple functions to see exactly how much live data
there is.)

I was actually expecting to see more than a 2:1 difference in file size,
seeing that you reported more than a 2:1 difference in read time. It
could be that there is also some question of physical layout of the file
on disk. The original table probably accreted over time and was given
space that's not very consecutive on disk. The copied table was
written in one swoop, more or less, and is very likely stored in a more
nearly consecutive set of physical disk blocks. I'm not sure how you'd
investigate this theory though --- AFAIK there isn't any real easy way
to find out how badly fragmented a file is in most Unix filesystems.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ***@postgresql.org)

C. Bensend

2004-06-16 03:09:29 UTC

Post by Tom Lane
Right. The only thing that will do for you is waste space with padding
blanks. The performance improvement you saw was due to something else,
which I theorize is not having to waste time reading dead space.
Since you obviously doubt this assertion, try the same experiment except
load the data into a new table with the *same* structure as the original
(ie, all varchar). You'll see the same or probably better performance.

Hi Tom,

I don't doubt your assertion, I just don't understand it all yet. :)
Let me assure you, you're the expert here, not I.

I did as you suggested - I created a third copy of the table, using
the exact same structure. And you're absolutely right - it was lightning
fast (around 100ms).

Post by Tom Lane

Post by C. Bensend
Actually, all databases on this server are vacuumed nightly, right
before backups.

Not often enough evidently...

This statement worries me a bit. The data in the original table was
bulk-loaded the other night (less than three days I'm sure), and new
entries have been added at the approximate rate of 300 per day. Is this
going to continue to happen?

Or do I just need to vacuum more often? I _did_ try a vacuum before
asking the list for help, but it didn't give any improvement (just a
vacuum analyze).

Post by Tom Lane
Note the difference in "pages". emails is nearly double the physical
size, even though the live data in it is doubtless smaller. (You could
use the contrib/pgstattuple functions to see exactly how much live data
there is.)

OK. I see (and understand) the pages value now.

Post by Tom Lane
I was actually expecting to see more than a 2:1 difference in file size,
seeing that you reported more than a 2:1 difference in read time. It
could be that there is also some question of physical layout of the file
on disk. The original table probably accreted over time and was given
space that's not very consecutive on disk. The copied table was
written in one swoop, more or less, and is very likely stored in a more
nearly consecutive set of physical disk blocks. I'm not sure how you'd
investigate this theory though --- AFAIK there isn't any real easy way
to find out how badly fragmented a file is in most Unix filesystems.

Ugh. So, what would you recommend as a fix? I see the problem, and I
see the fix that just worked, but I certainly can't be the only person
around that is using a "wide" table with a lot of character data being
added at a rather slow rate...

Thank you _very much_ for all your help,

Benny

--
"Oh, the Jedis are going to feel this one!" -- Professor Farnsworth,
"Futurama"

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ***@postgresql.org

Iain

2004-06-16 04:56:11 UTC

Post by C. Bensend
Or do I just need to vacuum more often? I _did_ try a vacuum before
asking the list for help, but it didn't give any improvement (just a
vacuum analyze).

"vacuum analyse" allows dead space to be re-used, but doesn't compact the
table, you need "vacuum analyse full" for that.

I'm not sure how the free space map relates to this, maybe someone can shed
some light, but if it is too small, not all dead rows can be re-used (as I
understand it). If your table (or even the DB in general) is updated alot,
the default FSM size that comes with 7.3 may not be enough to ensure the
most efficient space reclamation. Last time I looked the docs were a little
hazy on that.

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Scott Marlowe

2004-06-16 05:31:11 UTC

Post by C. Bensend

Post by Tom Lane
Right. The only thing that will do for you is waste space with padding
blanks. The performance improvement you saw was due to something else,
which I theorize is not having to waste time reading dead space.
Since you obviously doubt this assertion, try the same experiment except
load the data into a new table with the *same* structure as the original
(ie, all varchar). You'll see the same or probably better performance.

Hi Tom,
I don't doubt your assertion, I just don't understand it all yet. :)
Let me assure you, you're the expert here, not I.
I did as you suggested - I created a third copy of the table, using
the exact same structure. And you're absolutely right - it was lightning
fast (around 100ms).

Post by Tom Lane

Post by C. Bensend
Actually, all databases on this server are vacuumed nightly, right
before backups.

Not often enough evidently...

This statement worries me a bit. The data in the original table was
bulk-loaded the other night (less than three days I'm sure), and new
entries have been added at the approximate rate of 300 per day. Is this
going to continue to happen?
Or do I just need to vacuum more often? I _did_ try a vacuum before
asking the list for help, but it didn't give any improvement (just a
vacuum analyze).

Post by Tom Lane
Note the difference in "pages". emails is nearly double the physical
size, even though the live data in it is doubtless smaller. (You could
use the contrib/pgstattuple functions to see exactly how much live data
there is.)

OK. I see (and understand) the pages value now.

Post by Tom Lane
I was actually expecting to see more than a 2:1 difference in file size,
seeing that you reported more than a 2:1 difference in read time. It
could be that there is also some question of physical layout of the file
on disk. The original table probably accreted over time and was given
space that's not very consecutive on disk. The copied table was
written in one swoop, more or less, and is very likely stored in a more
nearly consecutive set of physical disk blocks. I'm not sure how you'd
investigate this theory though --- AFAIK there isn't any real easy way
to find out how badly fragmented a file is in most Unix filesystems.

Ugh. So, what would you recommend as a fix? I see the problem, and I
see the fix that just worked, but I certainly can't be the only person
around that is using a "wide" table with a lot of character data being
added at a rather slow rate...

You might want to look into the autovacuum daemon, and / or increasing
fsm settings to be large enough to hold all the spare tuples released by
vacuuming.

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

C. Bensend

2004-06-16 14:37:58 UTC

Post by Scott Marlowe
You might want to look into the autovacuum daemon, and / or increasing
fsm settings to be large enough to hold all the spare tuples released by
vacuuming.

IIRC, the autovacuum stuff was added in 7.4, which I'm not running (yet).
An upgrade to 7.4.3 might be prudent for me, while the database is still
small.

Benny

--
"Oh, the Jedis are going to feel this one!" -- Professor Farnsworth,
"Futurama"

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Olivier Hubaut

2004-06-17 06:46:47 UTC

On Wed, 16 Jun 2004 09:37:58 -0500 (CDT), C. Bensend

Post by C. Bensend

Post by Scott Marlowe
You might want to look into the autovacuum daemon, and / or increasing
fsm settings to be large enough to hold all the spare tuples released by
vacuuming.

IIRC, the autovacuum stuff was added in 7.4, which I'm not running (yet).
An upgrade to 7.4.3 might be prudent for me, while the database is still
small.
Benny

By the way, be carefull. The pg_autovacuum is not good enough for being
using as the only way for cleaning the database, especially if you have a
lot of update and delete on the same tables.

--
Downloading signature ... 99%
*CRC FAILED*
signature aborted

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ***@postgresql.org

C. Bensend

2004-06-17 14:58:52 UTC

Post by Olivier Hubaut
By the way, be carefull. The pg_autovacuum is not good enough for being
using as the only way for cleaning the database, especially if you have a
lot of update and delete on the same tables.

For right now, I'm just vacuuming more often to see how things go. I
didn't think I was even generating enough transactions for it to be a
problem, but a 'vacuum full' made a night-and-day difference.

If it continues to be problematic, I'll probably look at an upgrade.

Benny

--
"Oh, the Jedis are going to feel this one!" -- Professor Farnsworth,
"Futurama"

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Scott Marlowe

2004-06-17 15:37:36 UTC

Post by Olivier Hubaut
On Wed, 16 Jun 2004 09:37:58 -0500 (CDT), C. Bensend

Post by C. Bensend

Post by Scott Marlowe
You might want to look into the autovacuum daemon, and / or increasing
fsm settings to be large enough to hold all the spare tuples released by
vacuuming.

IIRC, the autovacuum stuff was added in 7.4, which I'm not running (yet).
An upgrade to 7.4.3 might be prudent for me, while the database is still
small.
Benny

By the way, be carefull. The pg_autovacuum is not good enough for being
using as the only way for cleaning the database, especially if you have a
lot of update and delete on the same tables.

The problem here isn't pg_autovacuum, but too small of settings for
fsm. I've run multi-day tests where autovacuum kept the size of the
database pretty much the same with 200+ updates a second going on.

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ***@postgresql.org

C. Bensend

2004-06-17 15:57:52 UTC

Post by Scott Marlowe
The problem here isn't pg_autovacuum, but too small of settings for
fsm. I've run multi-day tests where autovacuum kept the size of the
database pretty much the same with 200+ updates a second going on.

Hi Scott,

Could you explain the fsm a little more? I have done _no_ tuning on
my database simply because I don't know how, but I would be interested
to hear your recommendations for the fsm settings.

Thanks!

Benny

--
"Oh, the Jedis are going to feel this one!" -- Professor Farnsworth,
"Futurama"

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Scott Marlowe

2004-06-17 23:50:48 UTC

Post by C. Bensend

Post by Scott Marlowe
The problem here isn't pg_autovacuum, but too small of settings for
fsm. I've run multi-day tests where autovacuum kept the size of the
database pretty much the same with 200+ updates a second going on.

Hi Scott,
Could you explain the fsm a little more? I have done _no_ tuning on
my database simply because I don't know how, but I would be interested
to hear your recommendations for the fsm settings.

OK, Here's the basics of what fsm does. As you probably already know,
PostgreSQL uses an in store multiple versioning system, where the first
version of a tuple points to next newest version, and so on, until we
get to the last version. So, if you update the same row four times,
you'll have five copies of it in the data storage files, the original,
plus four copies.

Each transaction that needs to access said tuple will grab the first
one, check the date stamp on it, and go on the the next one, until it
finds the last valid one for it, based on the rules of visibility. We
won't get into that right now, because what's important is that we have
5 versions of that tuple in memory.

Now, after all the currently processing transactions finish, all the new
transactions are going to be seeing the fifth version of the tuple from
then on. But they'll all start by looking at the first one and working
forward one at a time.

Along comes vacuum and vacuum full. Vacuum full marks all these dead
tuples, then actually compresses the data file back down to save the
actual lost space. Plain (or lazy) vacuums simply mark the rows as
free, without actually reclaiming them. The free space map is what
vacuum uses to mark those tuples as free, and what the storage manager
checks to find free space to place new tuples in the tables.

So, the free space map needs to be big enough to hold a reference to
every single freed row from vacuuming, or the vacuumed tuple space will
not get reused, and the storage manager will simply append new tuples
onto the end of the data file.

You can find a bit more on this subject at:

http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Iain

2004-06-21 01:17:09 UTC

There is some information on www.varlena.com I think it may have been Scott
who wrote the article that I'm thinking about. There is other information
about, your best bet may be google.

Regards
Iain
----- Original Message -----
From: "C. Bensend" <***@bennyvision.com>
To: <pgsql-***@postgresql.org>
Sent: Friday, June 18, 2004 12:57 AM
Subject: Re: [ADMIN] VARCHAR -vs- CHAR: huge performance difference?

Post by C. Bensend

Post by Scott Marlowe
The problem here isn't pg_autovacuum, but too small of settings for
fsm. I've run multi-day tests where autovacuum kept the size of the
database pretty much the same with 200+ updates a second going on.

Hi Scott,
Could you explain the fsm a little more? I have done _no_ tuning on
my database simply because I don't know how, but I would be interested
to hear your recommendations for the fsm settings.
Thanks!
Benny
--
"Oh, the Jedis are going to feel this one!" -- Professor Farnsworth,
"Futurama"
---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Iain

2004-06-16 03:17:57 UTC

Hi Benny,

What happens if you recreate the table using varchar? I mean use the same
procedure yoused to create the table with char, but leave the definitions as
varchar.

Personally, I can't see any logical reason for char being faster. The
problem is the size of the row, each row spans multiple database pages and
all have to be read to do a count. It's a lot of IO, and varchar should
actually be more compact in your case.

I would think that the speedup you see is a result of the table having a
fresh new organization. Anyway, I'm interested to hear what happens if you
do the test above.

I liked Scot's suggestion of using a small table containing only the IDs to
use for counting. A serial scan of this small table could be expected to be
much faster. Postgres doesnt use an index in the case you specified, as I
understand it.

Regards
Iain
----- Original Message -----
From: "C. Bensend" <***@bennyvision.com>
To: <pgsql-***@postgresql.org>
Sent: Wednesday, June 16, 2004 11:12 AM
Subject: Re: [ADMIN] VARCHAR -vs- CHAR: huge performance difference?

Post by C. Bensend

Post by C. Bensend
So, I went ahead and created an exact copy of this table, with the
exception of creating all character columns as type char(), not varchar().
I was pondering if making PostgreSQL worry about the varying lengths
by using varchar was the problem...

The above transformation is a guaranteed loser in Postgres.

Hi Tom,
By transformation, do you mean the varchar() -> char() change? If
so, I'm not sure I understand - it certainly improved the performance.
Or am I misunderstanding?

I'm betting that the original table is physically huge because you've
not vacuumed it regularly. The copying produced a table with no wasted
space, so physically smaller even though the live data is noticeably
bigger (because of all the padding blanks you forced to be added).
Check what VACUUM VERBOSE has to say about each of these tables...

Actually, all databases on this server are vacuumed nightly, right
prod01=> vacuum verbose emails;
INFO: --Relation public.emails--
INFO: Index emails_email_id_idx: Pages 358; Tuples 24198: Deleted 82.
CPU 0.03s/0.01u sec elapsed 0.41 sec.
INFO: Index emails_date_received_idx: Pages 325; Tuples 24198: Deleted 82.
CPU 0.00s/0.00u sec elapsed 0.63 sec.
INFO: Removed 82 tuples in 23 pages.
CPU 0.00s/0.00u sec elapsed 0.06 sec.
INFO: Pages 5793: Changed 0, Empty 0; Tup 24198: Vac 82, Keep 0, UnUsed
71757.
Total CPU 0.24s/0.06u sec elapsed 4.71 sec.
INFO: --Relation pg_toast.pg_toast_399420--
INFO: Pages 0: Changed 0, Empty 0; Tup 0: Vac 0, Keep 0, UnUsed 0.
Total CPU 0.00s/0.00u sec elapsed 0.00 sec.
VACUUM
prod01=> vacuum verbose emails2;
INFO: --Relation public.emails2--
INFO: Pages 2646: Changed 0, Empty 0; Tup 24162: Vac 0, Keep 0, UnUsed 0.
Total CPU 0.10s/0.00u sec elapsed 1.00 sec.
INFO: --Relation pg_toast.pg_toast_859969--
INFO: Pages 0: Changed 0, Empty 0; Tup 0: Vac 0, Keep 0, UnUsed 0.
Total CPU 0.00s/0.00u sec elapsed 0.00 sec.
VACUUM
Thanks very much,
Benny
--
"Oh, the Jedis are going to feel this one!" -- Professor Farnsworth,
"Futurama"
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Tom Lane

2004-06-16 02:07:49 UTC

Post by C. Bensend
So, I went ahead and created an exact copy of this table, with the
exception of creating all character columns as type char(), not varchar().
I was pondering if making PostgreSQL worry about the varying lengths
by using varchar was the problem...

The above transformation is a guaranteed loser in Postgres.

Post by C. Bensend
And sure enough, counting the rows on the new table takes around
148ms. That's a pretty big difference from 3600ms.

I'm betting that the original table is physically huge because you've
not vacuumed it regularly. The copying produced a table with no wasted
space, so physically smaller even though the live data is noticeably
bigger (because of all the padding blanks you forced to be added).

Check what VACUUM VERBOSE has to say about each of these tables...

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ***@postgresql.org so that your
message can get through to the mailing list cleanly

V i s h a l Kashyap @ [Sai Hertz And Control Systems]

2004-06-16 12:52:38 UTC

Post by C. Bensend
I am parsing and storing emails, so I have a lot of character string
data to worry about. In one particular table, I have 26 columns of type
varchar, with widths of 24-8192 characters. This is the table that is
giving me some pretty severe performance problems.
For example, many of the statistics I run against these emails is a
simple count(), whether it be on a column or *. Right now, counting the
rows takes in excess of 3.5 seconds. OK, that's pretty slow. However,
there are only --> 24,000 <-- rows in this table! I could see it taking
a long time if I had 500,000 rows, or a million, but 24,000?

For similar post someone suggested to do a count like

select count(*) from table where any = any limit 1 ;

Try this look if thiers any performance increase.
--
Regards,
Vishal Kashyap
Director / Lead Software Developer,
Sai Hertz And Control Systems Pvt Ltd,
http://saihertz.rediffblogs.com
Yahoo IM: mailforvishal[ a t ]yahoo.com

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

17 Replies
1 View
Permalink to this page
Disable enhanced parsing

Thread Navigation

C. Bensend 2004-06-16 00:58:02 UTC

Scott Marlowe 2004-06-16 01:23:06 UTC

C. Bensend 2004-06-16 01:31:56 UTC

C. Bensend 2004-06-16 02:12:29 UTC

Tom Lane 2004-06-16 02:51:52 UTC

C. Bensend 2004-06-16 03:09:29 UTC

Iain 2004-06-16 04:56:11 UTC

Scott Marlowe 2004-06-16 05:31:11 UTC

C. Bensend 2004-06-16 14:37:58 UTC

Olivier Hubaut 2004-06-17 06:46:47 UTC

C. Bensend 2004-06-17 14:58:52 UTC

Scott Marlowe 2004-06-17 15:37:36 UTC

C. Bensend 2004-06-17 15:57:52 UTC

Scott Marlowe 2004-06-17 23:50:48 UTC

Iain 2004-06-21 01:17:09 UTC

Iain 2004-06-16 03:17:57 UTC

Tom Lane 2004-06-16 02:07:49 UTC

V i s h a l Kashyap @ [Sai Hertz And Control Systems] 2004-06-16 12:52:38 UTC

about - legalese

Loading...