Discussion:
Utilizing multiple disks
(too old to reply)
Steve
2004-07-21 23:44:55 UTC
Permalink
Hi,

I've been running postgres on my server for over a year now and the
tables have become huge. I have 3 tables that have data over 10GB each
and these tables are read very very frequently. In fact, heavy searches
on these tables are expected every 2 to 3 minutes. This unfortunately
gives a very poor response time to the end user and so I'm looking at
other alternatives now.

Currently, the postgresql installation is on a single disk and so all
the tables have their data read from a single disk. Searching on
different tables by multiple users at the same time results in very slow
searches, as it's mainly dependant on the spindle speed. I recently
gained access to another server which has 3 SCSI disks. I know there is
a way to mirror the tables across the three different disks but I'm not
sure if it's as easy as symlinking the files (WAL files only?) across.
Can anyone please tell me what to do here and how to harness the power
of the three SCSI drives that I have. Which files in the data directory
need to be moved? Is this safe? Can backups etc be easily done? Any
information will be greatly appreciated. Thank you,

Steve
Gaetano Mendola
2004-07-25 00:03:59 UTC
Permalink
Post by Steve
Hi,
I've been running postgres on my server for over a year now and the
tables have become huge. I have 3 tables that have data over 10GB each
and these tables are read very very frequently. In fact, heavy searches
on these tables are expected every 2 to 3 minutes. This unfortunately
gives a very poor response time to the end user and so I'm looking at
other alternatives now.
This depend on the query that you are running on it:

Are you performing queries using the like operator? If yes did you define
an index on it using the right operator class ?

Are you performing queries on a calculated field ? If yes then you need
to construct a sort of materialized view.

If you are on Linux did you mounted your data partition with the option
noatime ?


Please provide use more information on your queries and on your datas,
your configurations...

Usualy split your tables on multiple disk is the last optimization step,
are you sure did you already reach the bootleneck of your sistem ?



Regards
Gaetano Mendola
Steve
2004-07-26 01:46:22 UTC
Permalink
Gaetano,

Thanks for your reply.
Post by Gaetano Mendola
Are you performing queries using the like operator? If yes did you define
an index on it using the right operator class ?
Yes I do use the like operator to perform queries on a few 'varchar' and
'text' fields. I have indexed almost every column in the table (because
the client application allows complex searches on multiple columns and
each column has an equal chance of being used in the 'WHERE' condition)
Post by Gaetano Mendola
Are you performing queries on a calculated field ? If yes then you need
to construct a sort of materialized view.
Hmm not sure what you mean by a calculated field. Since I'm not sure
what it means, I'm probably not using it. I'm definitely not running any
functions on columns to get an accumulated result back etc.
Post by Gaetano Mendola
If you are on Linux did you mounted your data partition with the option
noatime ?
Yes I'm on linux and am using the ext3 filesystem. I am just about to
mount the data partition with the noatime option. Docos do say that the
speed increases considerably. I'll give it a shot but I'm still very
interested in utilizing the extra three SCSI disks I have at my disposal.
Post by Gaetano Mendola
Usualy split your tables on multiple disk is the last optimization step,
are you sure did you already reach the bootleneck of your sistem ?
Actually IO is currently the only bottleneck for my system. I was using
a single drive and had optimized it as much as I could (used indexes,
partitioned tables horizontally on old dates etc) but I thought it would
help me on the long run if I could store the actualy physical 'table
files' on separate disks (as the rule says: more spindles == parallel
IO). My current database basically has quite a few tables, out of which
only four are the 'largest'. Each table is completely independant (I
don't have a relational database setup because I didn't need it in this
case) and is accessed simultaneously by around 10 users at one time.
Each table also has almost all of its columns indexed. some of these
columns are of type 'varchar', 'text' or 'double'. I am aware that the
'text' fields are stored internally in a separate table. Does this
complicate things? Is it as easy as moving the files (I don't know which
ones though - please help me out here) to an independant drive and then
creating symbolic links to it to the '/opt/postgresql/data/...'
directory? How do I achieve this without losing any of the data.

Database integrity is of utmost importance and so is speed. I know there
are tradeoffs but I really do think that moving the data to a separate
disk will help. Can you please tell me how to set this up? Thanks a lot!

Steve
Gaetano Mendola
2004-07-26 11:59:12 UTC
Permalink
Post by Steve
Gaetano,
Thanks for your reply.
Post by Gaetano Mendola
Are you performing queries using the like operator? If yes did you define
an index on it using the right operator class ?
Yes I do use the like operator to perform queries on a few 'varchar' and
'text' fields. I have indexed almost every column in the table (because
the client application allows complex searches on multiple columns and
each column has an equal chance of being used in the 'WHERE' condition)
And did you use the right operator class ?
See http://www.postgresql.org/docs/7.4/interactive/indexes-opclass.html
for details
Post by Steve
Post by Gaetano Mendola
Are you performing queries on a calculated field ? If yes then you need
to construct a sort of materialized view.
Hmm not sure what you mean by a calculated field. Since I'm not sure
what it means, I'm probably not using it. I'm definitely not running any
functions on columns to get an accumulated result back etc.
if you have for example a view like:

CREATE VIEW v_test AS
SELECT foo(a) AS field_1,
bar(a+b) AS field_2
FROM test;


and you do: SELECT * from v_test WHERE field_2 = 123;

then you are in trouble!
Post by Steve
Post by Gaetano Mendola
If you are on Linux did you mounted your data partition with the option
noatime ?
Yes I'm on linux and am using the ext3 filesystem. I am just about to
mount the data partition with the noatime option. Docos do say that the
speed increases considerably. I'll give it a shot but I'm still very
interested in utilizing the extra three SCSI disks I have at my disposal.
mount that disk in RAID configuration then, you can at least try to move the
WAL on a separate disk.
Post by Steve
Database integrity is of utmost importance and so is speed. I know there
are tradeoffs but I really do think that moving the data to a separate
disk will help. Can you please tell me how to set this up? Thanks a lot!
Steve
The version 7.5 will give you the tablespaces feature, in the mean time you
can move your table and or index across your disk using symlinks, moving the
file associated with your table or index in another disk and replace it with a
link.


Regards
Gaetano Mendola

Loading...