duplicates

Discussion:

duplicates

(too old to reply)

Tsirkin Evgeny

2004-09-06 06:33:24 UTC

Hello dear list!
Here is the problem i have:
i am using 7.3.4 postgres .i have an aplication that updating 2 tables.
while it needs to update something - it does not select the rows that
are already in the table search what it needs to update and execute
an 'update' query .in place it deletes all the old rows and inserts the
new one.However while we have a havy load we got a duplicate rows in the
table ,althought we use transaction an both delete and the new insert
are in the same transaction.We are pretty sure there is no bug in
the applicatioin that inserts the data more then once .
Is that a known problem ?What could be the problem?

--
Evgeny.

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ***@postgresql.org so that your
message can get through to the mailing list cleanly

Andrei Bintintan

2004-09-06 07:48:17 UTC

Permalink

I am not sure that I understand clearly your problem. Are you sure that your
query's are written correctly?

For duplicates you can make a uniqe indexing so this will avoid any
dupplicates in your table.
CREATE UNIQUE INDEX table_column_uniqueidx ON table(column);

If the rows are dissapearing, please check your delete query, because that
is the only way to erase all rows from the table.

If still got problems, please post some queries, be more specific.

Best regards,
Andy.

----- Original Message -----
From: "Tsirkin Evgeny" <***@mail.jct.ac.il>
To: <pgsql-***@postgresql.org>
Sent: Monday, September 06, 2004 9:33 AM
Subject: [ADMIN] duplicates

Post by Tsirkin Evgeny
Hello dear list!
i am using 7.3.4 postgres .i have an aplication that updating 2 tables.
while it needs to update something - it does not select the rows that
are already in the table search what it needs to update and execute
an 'update' query .in place it deletes all the old rows and inserts the
new one.However while we have a havy load we got a duplicate rows in the
table ,althought we use transaction an both delete and the new insert
are in the same transaction.We are pretty sure there is no bug in
the applicatioin that inserts the data more then once .
Is that a known problem ?What could be the problem?
--
Evgeny.
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
message can get through to the mailing list cleanly

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ***@postgresql.org so that your
message can get through to the mailing list cleanly

Tsirkin Evgeny

2004-09-06 07:57:00 UTC

Permalink

Post by Andrei Bintintan
If still got problems, please post some queries, be more specific.
Best regards,
Andy.

Ok i will try:
CREATE TABLE schedule (
studentid decimal(9),
groupid decimal(10),
maslulsignid decimal(7),
tfusot varchar(29)
);

that is the table for writing down the courses/groups
a students takes.note there is NO unique constrain here
(maybe it should be but that the way it is and probably
can't can't be changed).while changing groups for a student
the applic. deletes all the groups the student had :
delete from schedule where studentid=11111;
and then inserts the new groups (regular inserts).
Now sometimes we got duplicates of the same groupid and
studentid in the table(everything is same).i thought that \
maybe the delete do not delete but the insert succeed?

Note :I know that there could be created the a unique key and
a constrain but befor we have to understand why the duplicates
were created.

Andrei Bintintan

2004-09-06 08:31:12 UTC

Permalink

----- Original Message -----
From: "Tsirkin Evgeny" <***@mail.jct.ac.il>
To: "Andrei Bintintan" <***@ar-sd.net>
Cc: <pgsql-***@postgresql.org>
Sent: Monday, September 06, 2004 10:57 AM
Subject: Re: [ADMIN] duplicates

Post by Tsirkin Evgeny

Post by Andrei Bintintan
If still got problems, please post some queries, be more specific.
Best regards,
Andy.

CREATE TABLE schedule (
studentid decimal(9),
groupid decimal(10),
maslulsignid decimal(7),
tfusot varchar(29)
);

Use something like this:

CREATE TABLE schedule (
id serial PRIMARY KEY,
studentid decimal(9),
groupid decimal(10),
maslulsignid decimal(7),
tfusot varchar(29)
);

Now, ALWAYS use a ID for a table. This id will be always uniqe(because it's
primary key). You don't have to insert this field when you're making an
insert, the server does it automatically. And when you delete you refere
this key and not the student ID.

For ex: you have student id = 1111 in 3 groups
id studentid groupid maslulsignid
1 1111 22 some val
2 1111 33 some val
3 1111 44 some val

If you delete: "Delete from table where studentid=1111" it deletes all the
fields from table. If you want to delete only field 2 then you delete:
delete from table where id=2.

Normally in this kind of tables you should never have dupplicates I mean, in
tables you should not have dupplicates(all fields the samein your case
(studentid groupid maslulsignid tfusot) the same val - this means
structure conception error).

Hope this helps.

Best regards.

Post by Tsirkin Evgeny
that is the table for writing down the courses/groups
a students takes.note there is NO unique constrain here
(maybe it should be but that the way it is and probably
can't can't be changed).while changing groups for a student
delete from schedule where studentid=11111;
and then inserts the new groups (regular inserts).
Now sometimes we got duplicates of the same groupid and
studentid in the table(everything is same).i thought that \
maybe the delete do not delete but the insert succeed?
Note :I know that there could be created the a unique key and
a constrain but befor we have to understand why the duplicates
were created.
--
Evgeny.

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Tsirkin Evgeny

2004-09-06 09:19:59 UTC

Permalink

yes i understand that i can create a primary key/unique etc...
however my question is if i have to understand if and why
the i got duplicate rows inserted.
so here is the picture:
an application is deleting rows and inserting right after that
new ones some of which are the same as the old one,and i am getting
duplicates !Don't transaction should prevent this by not letting
insert ot do something in case the delete did not succeed?
Another option is that i have to clients deleting and then inserting the
same thing into table ,but again should not transaction prevent duplicates
in that case ?

Post by Tsirkin Evgeny
CREATE TABLE schedule (
id serial PRIMARY KEY,
studentid decimal(9),
groupid decimal(10),
maslulsignid decimal(7),
tfusot varchar(29)
);

--
Evgeny.

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Kostis Mentzelos

2004-09-06 09:40:55 UTC

Permalink

Post by Tsirkin Evgeny
yes i understand that i can create a primary key/unique etc...
however my question is if i have to understand if and why
the i got duplicate rows inserted.
an application is deleting rows and inserting right after that
new ones some of which are the same as the old one,and i am getting
duplicates !Don't transaction should prevent this by not letting
insert ot do something in case the delete did not succeed?
Another option is that i have to clients deleting and then inserting the
same thing into table ,but again should not transaction prevent duplicates
in that case ?

Post by Tsirkin Evgeny
CREATE TABLE schedule (
id serial PRIMARY KEY,
studentid decimal(9),
groupid decimal(10),
maslulsignid decimal(7),
tfusot varchar(29)
);

No, you should lock the table.

Kostis Mentzelos

2004-09-06 10:08:30 UTC

Permalink

Post by Kostis Mentzelos
No, you should lock the table.

I mean that you can select or insert into that table even if a
transaction is running.

Andrei Bintintan

2004-09-06 10:04:07 UTC

Permalink

Maybe you're inserting not only one time that dupplicate pair. In your table
definition nothing stopes this.

Post by Tsirkin Evgeny
duplicates !Don't transaction should prevent this by not letting
insert ot do something in case the delete did not succeed?

Only if you check your query's result and you say "rollback" to the
transaction in case of an error.

Post by Tsirkin Evgeny
Another option is that i have to clients deleting and then inserting the
same thing into table ,but again should not transaction prevent duplicates
in that case ?

No, perhaps you're missunderstanding transactions. Transactions don't
prevent anything, only if you say so.

For ex:
Begin
Delete query -- if here is error you should run a Rollback and don't run any
insert.
If the delete query succedes then you should run the inserts... etc etc.
At the end of the transaction you should Commit ONLY if everything worked
as you wanted!!!!

The uniqe indexes helps you to prevent double inserts into the database.
For example:
CREATE UNIQUE INDEX table_column_uniqueidx ON table(studentid, groupid);
will assure you that you have only one ROW with this pair of data. By assure
I mean, that if you have the pair (1111, 22) in the table, and you want to
insert another (1111, 22) pair into the table, the insert query will fail.

In the transactions you will have to check the result of every
delete/insert/update query so that you know there was a failuire.

You can also look in the database log file, and see exaclty what happened
there.

Hope this helps.

Post by Tsirkin Evgeny

Post by Tsirkin Evgeny
CREATE TABLE schedule (
id serial PRIMARY KEY,
studentid decimal(9),
groupid decimal(10),
maslulsignid decimal(7),
tfusot varchar(29)
);

--
Evgeny.

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Michael Paesold

2004-09-06 10:40:01 UTC

Permalink

Does not the Serializable Isolation Level do insure that?
what i thought is that while using this level then i am getting
the BEGIN and COMMIT to behave the same as the code you wrote!
since the second concarent transaction (the one that started a little
later) should wait for the first one and only then DELETE .
Is not that true?

BEGIN;
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

DELETE FROM schedule WHERE studentid = ... ;
INSERT INTO schedule (studentid, ...) VALUES (... );
INSERT INTO schedule (studentid, ...) VALUES (... );

COMMIT;

If you do it like in the above sql code, there is still a problem. The
serializable checking here only works, if DELETE FROM schedule... really
finds at least one row. If it does not, it will not recognize the
serialization problem. So it's still possible that your programm creates
duplicates.

I have tested this here. I don't really know if this is just the case with
PostgreSQL serializable transactions (MVCC limitation) or a general problem.

PS Does not the perl::DBI use the serializable level?

I don't really know about DBI, you should really check that.
* check that autocommit is off
* check that transaction isolation level is serializable

Still serializable transaction level is not enough. An aproach that allows
more concurrency is possible, but you have to make sure, that all relevant
code does the locking in the same way:
(assuming you have a student table with studentid as primary/unique key)

BEGIN;

SELECT * FROM student WHERE studentid = ... FOR UPDATE;
-- lock the student record

DELETE FROM schedule WHERE studentid = ... ;
INSERT INTO schedule (studentid, ...) VALUES (... );
INSERT INTO schedule (studentid, ...) VALUES (... );

COMMIT;

This will not lock the whole schedule table, but only one row of the student
table.

I hope that helps.

Best Regards,
Michael Paesold

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Michael Paesold

2004-09-06 10:51:45 UTC

Permalink

Post by Michael Paesold
BEGIN;
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
DELETE FROM schedule WHERE studentid = ... ;
INSERT INTO schedule (studentid, ...) VALUES (... );
INSERT INTO schedule (studentid, ...) VALUES (... );
COMMIT;
If you do it like in the above sql code, there is still a problem. The
serializable checking here only works, if DELETE FROM schedule... really
finds at least one row. If it does not, it will not recognize the
serialization problem. So it's still possible that your programm creates
duplicates.
I have tested this here. I don't really know if this is just the case with
PostgreSQL serializable transactions (MVCC limitation) or a general problem.

There is a detailed description about this problem in Section 12.2.2.1. of
the PostgreSQL 8 docs here:
http://developer.postgresql.org/docs/postgres/transaction-iso.html

(Serializable Isolation versus True Serializability)

Best Regards,
Michael Paesold

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Tsirkin Evgeny

2004-09-06 10:56:49 UTC

Permalink

You are greate Michael!
Thanks.

Post by Andrei Bintintan

problem.
There is a detailed description about this problem in Section 12.2.2.1. of
http://developer.postgresql.org/docs/postgres/transaction-iso.html
(Serializable Isolation versus True Serializability)
Best Regards,
Michael Paesold

--
Evgeny.

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ***@postgresql.org)

Tsirkin Evgeny

2004-09-06 12:10:09 UTC

Permalink

one moer question ,how did you tested it?

Post by Michael Paesold
I have tested this here. I don't really know if this is just the case with
Best Regards,
Michael Paesold
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Michael Paesold

2004-09-06 12:30:52 UTC

Permalink

Post by Tsirkin Evgeny
one moer question ,how did you tested it?

Post by Michael Paesold
I have tested this here. I don't really know if this is just the case with
Best Regards,
Michael Paesold

First I created the your schedule table. Then I opened two psql sessions...

Session A Session B

BEGIN;
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
DELETE FROM schedule WHERE studentid = 1;
INSERT INTO schedule VALUES (1, 1, 0, 0);

BEGIN;
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
DELETE FROM schedule WHERE studentid = 1;
-- if there existed records for this delete,
-- session B will now wait for session A to
-- commit/rollback and error out on commit;
-- otherwise no error

COMMIT;
INSERT INTO schedule VALUES (1, 1, 0, 0);
COMMIT;

You can also try and rollback the first transaction etc.

Again, what really helps is:
ALTER TABLE schedule ADD CONSTRAINT uniq_schedule UNIQUE (studentid,
groupid);

Best Regards,
Michael Paesold

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ***@postgresql.org so that your
message can get through to the mailing list cleanly

Jean-Luc Lachance

2004-09-06 14:36:21 UTC

Permalink

If the reccord you are inserting do not depend on data in the record you
want to delete, why not simply use a trigger?

Before insert delete the record with the same key!

Post by Michael Paesold

Post by Tsirkin Evgeny
one moer question ,how did you tested it?

Post by Michael Paesold
I have tested this here. I don't really know if this is just the case

with

Post by Tsirkin Evgeny

Post by Michael Paesold
Best Regards,
Michael Paesold

First I created the your schedule table. Then I opened two psql sessions...
Session A Session B
BEGIN;
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
DELETE FROM schedule WHERE studentid = 1;
INSERT INTO schedule VALUES (1, 1, 0, 0);
BEGIN;
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
DELETE FROM schedule WHERE studentid = 1;
-- if there existed records for this delete,
-- session B will now wait for session A to
-- commit/rollback and error out on commit;
-- otherwise no error
COMMIT;
INSERT INTO schedule VALUES (1, 1, 0, 0);
COMMIT;
You can also try and rollback the first transaction etc.
ALTER TABLE schedule ADD CONSTRAINT uniq_schedule UNIQUE (studentid,
groupid);
Best Regards,
Michael Paesold
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
message can get through to the mailing list cleanly

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ***@postgresql.org