Discussion:
please please please PLEASE help!
(too old to reply)
Steve
2004-07-23 06:10:55 UTC
Permalink
Hi,

I've asked this question a couple of times before on this forum but no
one seems to be nice enough to point me to the right direction or help
me out with any information, if possible. Please help me out with this
because this is a very serious issue for me and I need to learn more
about this. And here it is again:

I've been running postgres on my server for over a year now and the
tables have become huge. I have 3 tables that have data over 10GB each
and these tables are read very very frequently. In fact, heavy searches
on these tables are expected every 2 to 3 minutes. This unfortunately
gives a very poor response time to the end user and so I'm looking at
other alternatives now.

Currently, the postgresql installation is on a single disk and so all
the tables have their data read from a single disk. Searching on
different tables by multiple users at the same time results in very slow
searches, as it's mainly dependant on the spindle speed. I recently
gained access to another server which has 3 SCSI disks. I know there is
a way to mirror the tables across the three different disks but I'm not
sure if it's as easy as symlinking the files (WAL files only?) across.
Can anyone please tell me what to do here and how to harness the power
of the three SCSI drives that I have. Which files in the data directory
need to be moved? Is this safe? Can backups etc be easily done? Any
information will be greatly appreciated. Thank you,


Steve
Goulet, Dick
2004-07-29 15:05:01 UTC
Permalink
Steve,

I'm a little short on PostgreSQL experience, but gaining fast. I'm more of an Oracle nut. But I think you've pin pointed the problem, namely disk contention. The easiest solution I can think of would be to stripe the volume group across all three drives thereby spreading the pain across the drives. One other item I'd look at is creating an index that specifically answers the most frequent queries. Depending on your OS, you could mirror the data across two of the drives at the OS level, which would also spread the pain. Another solution, but it has a price tag on it, is to acquire an external disk array. These arrays have memory that they use as additional buffers. The cache gets populated with the most frequent accesses data & then your not limited by the drives anymore.

Dick Goulet
Senior Oracle DBA
Oracle Certified 8i DBA

-----Original Message-----
From: Steve [mailto:***@hotmail.com]
Sent: Friday, July 23, 2004 2:11 AM
To: pgsql-***@postgresql.org
Subject: [ADMIN] please please please PLEASE help!


Hi,

I've asked this question a couple of times before on this forum but no
one seems to be nice enough to point me to the right direction or help
me out with any information, if possible. Please help me out with this
because this is a very serious issue for me and I need to learn more
about this. And here it is again:

I've been running postgres on my server for over a year now and the
tables have become huge. I have 3 tables that have data over 10GB each
and these tables are read very very frequently. In fact, heavy searches
on these tables are expected every 2 to 3 minutes. This unfortunately
gives a very poor response time to the end user and so I'm looking at
other alternatives now.

Currently, the postgresql installation is on a single disk and so all
the tables have their data read from a single disk. Searching on
different tables by multiple users at the same time results in very slow
searches, as it's mainly dependant on the spindle speed. I recently
gained access to another server which has 3 SCSI disks. I know there is
a way to mirror the tables across the three different disks but I'm not
sure if it's as easy as symlinking the files (WAL files only?) across.
Can anyone please tell me what to do here and how to harness the power
of the three SCSI drives that I have. Which files in the data directory
need to be moved? Is this safe? Can backups etc be easily done? Any
information will be greatly appreciated. Thank you,


Steve


---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ***@postgresql.org)

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ***@postgresql.org so that your
message can get through to the mailing list cleanly
Kris Kiger
2004-07-29 15:16:49 UTC
Permalink
Steve,

Are the three SCSI drives raided? If so, I would move the operating
database to that machine. From what I understand it can be a real
hassle to setup/maintain sym-linked tables. If you don't have the option
to move the database, I would explore one of these two routes:
Route one is replication. Slony1 was just released to the open
source community. Replicate the data to the new machine, then
distribute the database reads between the master and the slave.
Route two is symlink database logs/txlogs/etc to the new machine.
It takes extra write needs off of your main machine. Not as effective
as route 1, I would say, but it would definately lessen the load.

Kris
Post by Steve
Hi,
I've asked this question a couple of times before on this forum but no
one seems to be nice enough to point me to the right direction or help
me out with any information, if possible. Please help me out with this
because this is a very serious issue for me and I need to learn more
I've been running postgres on my server for over a year now and the
tables have become huge. I have 3 tables that have data over 10GB each
and these tables are read very very frequently. In fact, heavy searches
on these tables are expected every 2 to 3 minutes. This unfortunately
gives a very poor response time to the end user and so I'm looking at
other alternatives now.
Currently, the postgresql installation is on a single disk and so all
the tables have their data read from a single disk. Searching on
different tables by multiple users at the same time results in very slow
searches, as it's mainly dependant on the spindle speed. I recently
gained access to another server which has 3 SCSI disks. I know there is
a way to mirror the tables across the three different disks but I'm not
sure if it's as easy as symlinking the files (WAL files only?) across.
Can anyone please tell me what to do here and how to harness the power
of the three SCSI drives that I have. Which files in the data directory
need to be moved? Is this safe? Can backups etc be easily done? Any
information will be greatly appreciated. Thank you,
Steve
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend
Rob Bamber
2004-07-29 15:38:03 UTC
Permalink
Another thought -

We had a similar issue recently. Our support guys dropped the database and
then rebuilt it from a dump file. The size of the data directory went down
from 12GB to less than 2GB. According to the sys ad that did the work
postgres is not very good a reclaiming disk space after large quantities of
tuples are deleted over time.

HTH

-----Original Message-----
From: pgsql-admin-***@postgresql.org
[mailto:pgsql-admin-***@postgresql.org]On Behalf Of Kris Kiger
Sent: 29 July 2004 16:17
To: pgsql-***@postgresql.org
Subject: Re: [ADMIN] please please please PLEASE help!


Steve,

Are the three SCSI drives raided? If so, I would move the operating
database to that machine. From what I understand it can be a real
hassle to setup/maintain sym-linked tables. If you don't have the option
to move the database, I would explore one of these two routes:
Route one is replication. Slony1 was just released to the open
source community. Replicate the data to the new machine, then
distribute the database reads between the master and the slave.
Route two is symlink database logs/txlogs/etc to the new machine.
It takes extra write needs off of your main machine. Not as effective
as route 1, I would say, but it would definately lessen the load.

Kris
Post by Steve
Hi,
I've asked this question a couple of times before on this forum but no
one seems to be nice enough to point me to the right direction or help
me out with any information, if possible. Please help me out with this
because this is a very serious issue for me and I need to learn more
I've been running postgres on my server for over a year now and the
tables have become huge. I have 3 tables that have data over 10GB each
and these tables are read very very frequently. In fact, heavy searches
on these tables are expected every 2 to 3 minutes. This unfortunately
gives a very poor response time to the end user and so I'm looking at
other alternatives now.
Currently, the postgresql installation is on a single disk and so all
the tables have their data read from a single disk. Searching on
different tables by multiple users at the same time results in very slow
searches, as it's mainly dependant on the spindle speed. I recently
gained access to another server which has 3 SCSI disks. I know there is
a way to mirror the tables across the three different disks but I'm not
sure if it's as easy as symlinking the files (WAL files only?) across.
Can anyone please tell me what to do here and how to harness the power
of the three SCSI drives that I have. Which files in the data directory
need to be moved? Is this safe? Can backups etc be easily done? Any
information will be greatly appreciated. Thank you,
Steve
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend


---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match
Steve
2004-07-30 00:14:19 UTC
Permalink
Post by Rob Bamber
Another thought -
We had a similar issue recently. Our support guys dropped the database and
then rebuilt it from a dump file. The size of the data directory went down
from 12GB to less than 2GB. According to the sys ad that did the work
postgres is not very good a reclaiming disk space after large quantities of
tuples are deleted over time.
HTH
That's because you need to 'VACUUM [FULL | ANALYZE]' the database
frequently. For example, in our case there are about 2000
updates/inserts and around 50 'deletes' every 3 minutes . 'Update' and
'delete' operations are most expensive (in terms of disk space
utilization) as they tend to keep the updated/deleted tuples for a
longer period of time, until VACUUM is run. I've set it up so that a
daemon runs VACUUM ANALYZE every 6 minutes or so, and then a full
vacuum, i.e. VACUUM FULL ANALYZE, every 30 mins. As far as I know, in
postgresql 7.4, a simple VACUUM operation doesn't block read operations
and so occassianlly running VACUUM on the most 'popular' tables would be
a good idea. I'm sure in your case the frequency of calling vacuum could
vary, but that is basically why the database reduced to 2GB from 12GB.
VACUUM basically deletes unwanted tuples and indexes and so 'compresses'
the amount of disk space used (and so effectively speeding up queries
two to three orders of magnitude).

Steve
Steve
2004-07-30 00:18:03 UTC
Permalink
Post by Goulet, Dick
Steve,
Are the three SCSI drives raided? If so, I would move the operating
database to that machine. From what I understand it can be a real
hassle to setup/maintain sym-linked tables. If you don't have the
Route one is replication. Slony1 was just released to the open
source community. Replicate the data to the new machine, then
distribute the database reads between the master and the slave.
Route two is symlink database logs/txlogs/etc to the new machine. It
takes extra write needs off of your main machine. Not as effective as
route 1, I would say, but it would definately lessen the load.
Kris
Thanks for replying Kris. Yes, the SCSI drives are raided. I believe in
that case I don't really need to do anything other than moving the
database to that machine? I think the drives are striped every 12K or
so, but I'm not sure. Moving it to the new machine has made it a lot
faster than before. It has around 2GB RAM and is a dual processor 3.0
Ghz Intel xeon. I've also 'chattr -R +A /path/to/data/dir' the database
and I think it has made a slight difference. I'll test it a bit more and
see if I can do something else to improve the speed.

Steve
Andrew Biagioni
2004-07-29 15:16:39 UTC
Permalink
Steve,

I'm not as much of an expert on PostgreSQL as others, but at least I
have SOME answer.

The way to leverage the multi-disk capability is to use RAID, either
software or hardware, rather than trying to twist the PostgreSQL
structure and files to do so.

RAID (I forget what the acronym stands for) is basically a number of
methods to provide increased reliability and/or performance from
multiple disks, while having them appear to the programs (including DB
programs) as a single disk.

You will want to look for RAID information elsewhere, especially since
you should weigh the pros and cons of the various types of RAID, but
basically what you will want is to have your data pread out uniformly
across the disks so that any writes and reads (in your case, especially
the reads) are done in parallel on all three devices, thus significantly
cutting the access time.

Another solution that MIGHT help is increasing the memory. If a
heavily-used index can be cached in RAM it will speed searches up a lot...

I would recommend that you also study ways to redistribute your data
and/or add better indices. Keep in mind that an index will be quite
large for a big table, and may not be used to its full extent even when
it might seem that it should (at least in my experience). You can also
use EXPLAIN and EXPLAIN ANALYZE to find out where your bottlenecks are,
to see what else can be done.

I hope this helps, and that the last paragraph didn't come across as
patronizing. I'm aware that maybe you already did all that, but I can't
know -- and if you didn't, then it would be a shame not to mention it!

Andrew
Post by Steve
Hi,
I've asked this question a couple of times before on this forum but no
one seems to be nice enough to point me to the right direction or help
me out with any information, if possible. Please help me out with this
because this is a very serious issue for me and I need to learn more
I've been running postgres on my server for over a year now and the
tables have become huge. I have 3 tables that have data over 10GB each
and these tables are read very very frequently. In fact, heavy
searches on these tables are expected every 2 to 3 minutes. This
unfortunately gives a very poor response time to the end user and so
I'm looking at other alternatives now.
Currently, the postgresql installation is on a single disk and so all
the tables have their data read from a single disk. Searching on
different tables by multiple users at the same time results in very
slow searches, as it's mainly dependant on the spindle speed. I
recently gained access to another server which has 3 SCSI disks. I
know there is a way to mirror the tables across the three different
disks but I'm not sure if it's as easy as symlinking the files (WAL
files only?) across. Can anyone please tell me what to do here and how
to harness the power of the three SCSI drives that I have. Which files
in the data directory need to be moved? Is this safe? Can backups etc
be easily done? Any information will be greatly appreciated. Thank you,
Steve
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend
Jouneau Luc
2004-07-29 15:27:12 UTC
Permalink
Hi,

I don't want to offense you asking if you run vacuum full regullary. I
suppose that the 10Gb are plain of usefull data, aren't they ?

Luc
----- Original Message -----
From: "Steve" <***@hotmail.com>
To: <pgsql-***@postgresql.org>
Sent: Friday, July 23, 2004 8:10 AM
Subject: [ADMIN] please please please PLEASE help!
Post by Steve
Hi,
I've asked this question a couple of times before on this forum but no
one seems to be nice enough to point me to the right direction or help
me out with any information, if possible. Please help me out with this
because this is a very serious issue for me and I need to learn more
I've been running postgres on my server for over a year now and the
tables have become huge. I have 3 tables that have data over 10GB each
and these tables are read very very frequently. In fact, heavy searches
on these tables are expected every 2 to 3 minutes. This unfortunately
gives a very poor response time to the end user and so I'm looking at
other alternatives now.
Currently, the postgresql installation is on a single disk and so all
the tables have their data read from a single disk. Searching on
different tables by multiple users at the same time results in very slow
searches, as it's mainly dependant on the spindle speed. I recently
gained access to another server which has 3 SCSI disks. I know there is
a way to mirror the tables across the three different disks but I'm not
sure if it's as easy as symlinking the files (WAL files only?) across.
Can anyone please tell me what to do here and how to harness the power
of the three SCSI drives that I have. Which files in the data directory
need to be moved? Is this safe? Can backups etc be easily done? Any
information will be greatly appreciated. Thank you,
Steve
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match
Ian FREISLICH
2004-07-30 12:59:01 UTC
Permalink
Post by Rob Bamber
Another thought -
We had a similar issue recently. Our support guys dropped the
database and then rebuilt it from a dump file. The size of the data
directory went down from 12GB to less than 2GB. According to the sys
ad that did the work postgres is not very good a reclaiming disk space
after large quantities of tuples are deleted over time.
And another thought,

Have you tried clustering your tables on the most frequently used
and/or time-important index in your joins? (remember to VACUUM
ANALYZE after the cluster has completed) You might find a huge
performance increase. Clustering should also fix the problem
mentioned above because the table is physically re-written on disk.
Use EXPLAIN ANALYZE to find out which is the most time-consuming
part of your query and optimise that.

Also, have you analyzed your database recently? If you've never
run VACUUM ANALYZE your query planner's statistics have never been
updated so the query planner might not be making the best choices.

Ian

--
Ian Freislich

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Loading...