Discussion:
Going from a DB using SQL_ASCII to UNICODE
(too old to reply)
Dion Almaer
2004-04-19 19:44:33 UTC
Permalink
Hi -

We are running PostgreSQL 7.1.4, and are finally upgrading... to 7.4.2.

While we do this upgrade, we also want to make create our DB as UNICODE (we
are having problems with SQL_ASCII where we get a bunch of ? marks).

When we try to import the data from a SQL_ASCII dumped db, into the new
UNICODE db, everything freaks out.
We get errors restoring from dumps that used COPY and with INSERT.

We also tried to use the 7.4 pg_dump to make sure that it wasn't a problem
with the old pg_dump.

The errors are:

- ERROR: missing data for column "noiselevel"
- CONTEXT: COPY messages, line 1: "4393 -1 1441 14147 0
2000-10-12 18:30:58-05 EJB Design Questions \N Hi,"
- FATAL: invalid frontend message type 60
- ERROR: invalid byte sequence for encoding "UNICODE": 0xe9616c

Anyway, if anyone has any recommendations on how to migrate from SQL_ASCII
to UNICODE, with the knowledge that we are going from 7.1 to 7.4 at the same
time.... You will be life-savers!

Cheers,

Dion


---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ***@postgresql.org)
CoL
2004-04-19 21:56:17 UTC
Permalink
hi,
Post by Dion Almaer
- ERROR: missing data for column "noiselevel"
- CONTEXT: COPY messages, line 1: "4393 -1 1441 14147 0
2000-10-12 18:30:58-05 EJB Design Questions \N Hi,"
for this one: better to dump with: -d

C.
CoL
2004-04-19 22:02:07 UTC
Permalink
hi,
Post by Dion Almaer
- ERROR: invalid byte sequence for encoding "UNICODE": 0xe9616c
and for this: convert your dump with iconv to unicode, then load it.

C.
Tom Lane
2004-04-20 01:20:44 UTC
Permalink
Post by Dion Almaer
When we try to import the data from a SQL_ASCII dumped db, into the new
UNICODE db, everything freaks out.
I'm no expert on this stuff, but I think what you need to do is add

set client_encoding = sql_ascii;

to the top of the dump file. (As of fairly recently, pg_dump will
automatically add such a SET, but I'm pretty sure 7.4.2 won't.)

If that doesn't help, then what you have is actually not valid UTF-8
data, and what you'll have to do is figure out what encoding it's in and
specify that instead. If it's in a mishmash of different encodings,
you're in for some pain :-(

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend
Dion Almaer
2004-04-20 01:24:25 UTC
Permalink
Tom,

I will try to add "set client_encoding = sql_ascii;" and pray :)

Cheers,

Dion
-----Original Message-----
Sent: Monday, April 19, 2004 9:21 PM
To: Dion Almaer
Subject: Re: [ADMIN] Going from a DB using SQL_ASCII to UNICODE
Post by Dion Almaer
When we try to import the data from a SQL_ASCII dumped db, into the
new UNICODE db, everything freaks out.
I'm no expert on this stuff, but I think what you need to do is add
set client_encoding = sql_ascii;
to the top of the dump file. (As of fairly recently, pg_dump
will automatically add such a SET, but I'm pretty sure 7.4.2 won't.)
If that doesn't help, then what you have is actually not
valid UTF-8 data, and what you'll have to do is figure out
what encoding it's in and specify that instead. If it's in a
mishmash of different encodings, you're in for some pain :-(
regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ***@postgresql.org so that your
message can get through to the mailing list cleanly
Frank Finner
2004-04-20 05:05:51 UTC
Permalink
Hi,

I answered to a similiar problem on the [GENERAL] list a few days ago. In
short: You can use "recode" in a pipe between dump and restore like
"pg_dump database|recode ascii..utf8|psql newdatabase" to get rid of unicode
problems while transferring from a different encoding. Please look at my
statements on the[GENERAL] list for the gory details.

Regards, Frank.


On Mon, 19 Apr 2004 15:44:33 -0400
Post by Dion Almaer
Hi -
We are running PostgreSQL 7.1.4, and are finally upgrading... to 7.4.2.
While we do this upgrade, we also want to make create our DB as UNICODE (we
are having problems with SQL_ASCII where we get a bunch of ? marks).
When we try to import the data from a SQL_ASCII dumped db, into the new
UNICODE db, everything freaks out.
We get errors restoring from dumps that used COPY and with INSERT.
We also tried to use the 7.4 pg_dump to make sure that it wasn't a problem
with the old pg_dump.
- ERROR: missing data for column "noiselevel"
- CONTEXT: COPY messages, line 1: "4393 -1 1441 14147 0
2000-10-12 18:30:58-05 EJB Design Questions \N Hi,"
- FATAL: invalid frontend message type 60
- ERROR: invalid byte sequence for encoding "UNICODE": 0xe9616c
Anyway, if anyone has any recommendations on how to migrate from SQL_ASCII
to UNICODE, with the knowledge that we are going from 7.1 to 7.4 at the same
time.... You will be life-savers!
Cheers,
Dion
---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html
Peter Eisentraut
2004-04-21 09:51:30 UTC
Permalink
Post by Frank Finner
I answered to a similiar problem on the [GENERAL] list a few days ago. In
short: You can use "recode" in a pipe between dump and restore like
"pg_dump database|recode ascii..utf8|psql newdatabase" to get rid of
unicode problems while transferring from a different encoding. Please look
at my statements on the[GENERAL] list for the gory details.
This is not necessary. You just need to set the client encoding correctly
(not SQL_ASCII, but something like LATIN1). The encoding conversion happens
automatically, as usual.

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Loading...