Discussion:
Can I change the character encoding for a DB?
(too old to reply)
Iain
2004-01-26 08:28:49 UTC
Permalink
Hi All,

I created a DB with SQL_ASCII encoding but I want to change it to EUC_JP.
Can I do this without creating a new database? I didn't find a way in the
docs, sorry.

set server_encoding = EUC_JP;

didn't work.

My understanding is that the encoding is purely information and doesn't
actually affect the data stored in the DB. Is that right?

The reason I want to do this is that my java app is corrupting Japanese
Kanji. I suspect that the JDBC driver uses this information. VB apps using
ODBS seem to be fine.
Tom Lane
2004-01-26 17:05:55 UTC
Permalink
Post by Iain
I created a DB with SQL_ASCII encoding but I want to change it to EUC_JP.
Can I do this without creating a new database?
If you're certain that the data already in the database is in EUC_JP
encoding, it would probably work to do a quick UPDATE on the
pg_database.encoding entry for the database (and then start fresh
backend sessions, since they won't notice otherwise). If there's data in
there that is not correct per EUC_JP then this will create big problems,
so don't say I didn't warn you ...

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org
Iain
2004-01-27 02:37:41 UTC
Permalink
Hi Tom,

Thanks for the info and warning. Fortunately this is just a development
database so any problems won't be too big a deal.

Does dumping and re-creating the DB avoid such a potential problem?

I read the info written by Tasuo, but I have to admit that the whole
encoding thing is still full of mystery for me. For example, if the database
has a SQL_ASCII encoding and you enter japanese text, what encoding is used?
Does it depend on the client used to enter it? I wonder if it is possible
that there is some SJIS data in my DB... My understanding is that the DB
doesn't ensure that data makes sense in the set encoding, it just allows the
db to make conversions when dealing with clients such as the ODBC and JDBC
drivers.

Anyway, in this case the base data came from a EUC_JP encoded DB, but a VB
app (and some psql scripts) has been used to enter and modify data. It uses
the ODBC driver provided for postgres, everything had been working smoothly
and I only noticed a problem when I tried getting data using java and the
jdbc driver. I created a new mini test DB with the EUC_JP encoding and the
data was returned correctly.

regards,
Iain


----- Original Message -----
From: "Tom Lane" <***@sss.pgh.pa.us>
To: "Iain" <***@mst.co.jp>
Cc: <pgsql-***@postgresql.org>
Sent: Tuesday, January 27, 2004 2:05 AM
Subject: Re: [ADMIN] Can I change the character encoding for a DB?
Post by Tom Lane
Post by Iain
I created a DB with SQL_ASCII encoding but I want to change it to EUC_JP.
Can I do this without creating a new database?
If you're certain that the data already in the database is in EUC_JP
encoding, it would probably work to do a quick UPDATE on the
pg_database.encoding entry for the database (and then start fresh
backend sessions, since they won't notice otherwise). If there's data in
there that is not correct per EUC_JP then this will create big problems,
so don't say I didn't warn you ...
regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html
Tom Lane
2004-01-27 03:45:09 UTC
Permalink
Post by Iain
I read the info written by Tasuo, but I have to admit that the whole
encoding thing is still full of mystery for me. For example, if the database
has a SQL_ASCII encoding and you enter japanese text, what encoding is used?
Does it depend on the client used to enter it?
I'm not an expert on this stuff, but my understanding is that setting
the encoding to SQL_ASCII disables all encoding-conversion logic ---
whatever bytes the client sends are what get stored. So if you want
to change the setting from SQL_ASCII to something else, you'd better be
sure that everything every client has stored is compatible with the new
encoding specification.
Post by Iain
My understanding is that the DB
doesn't ensure that data makes sense in the set encoding,
When you select a real encoding spec (not SQL_ASCII) then the DB does
check that entered strings are valid in that encoding. Also it will
attempt to do encoding conversion, if clients specify they are using
a client_encoding that is different from the database encoding (and
is not SQL_ASCII).

Hope that helps --- if not, try Tatsuo ...

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match
Iain
2004-01-27 04:32:51 UTC
Permalink
Hi Tom,

Thanks for that. It's somewhat clearer now.

I went ahead and rebuilt the DB with no problems so I guess that whatever
data was in the DB converted OK.

Regards
Iain
Post by Tom Lane
Post by Iain
I read the info written by Tasuo, but I have to admit that the whole
encoding thing is still full of mystery for me. For example, if the database
has a SQL_ASCII encoding and you enter japanese text, what encoding is used?
Does it depend on the client used to enter it?
I'm not an expert on this stuff, but my understanding is that setting
the encoding to SQL_ASCII disables all encoding-conversion logic ---
whatever bytes the client sends are what get stored. So if you want
to change the setting from SQL_ASCII to something else, you'd better be
sure that everything every client has stored is compatible with the new
encoding specification.
Post by Iain
My understanding is that the DB
doesn't ensure that data makes sense in the set encoding,
When you select a real encoding spec (not SQL_ASCII) then the DB does
check that entered strings are valid in that encoding. Also it will
attempt to do encoding conversion, if clients specify they are using
a client_encoding that is different from the database encoding (and
is not SQL_ASCII).
Hope that helps --- if not, try Tatsuo ...
regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Loading...