Discussion:
Bad character data
(too old to reply)
Noah Davis
2004-05-03 19:23:40 UTC
Permalink
I posted this to the pgsql general list, but alas, I did not get any
responses. Perhaps someone here could be of assistance?

I have a database with some bad characters in it -- some users had entered
MS Word smart quotes, em dashes, foreign characters, and they look like
gibberish coming out of the database. Most important are the smart quotes I
guess.

What's the best way to replace these characters? I thought I might be able
to run a simple SQL UPDATE command, but some of the gibberish for different
characters looks the same (at least from my client it does), and it would
clobber them all.

I have a feeling there's some sort of ASCII code or unicode solution to this
problem, but I could use am little push in the right direction.

Thanks.

Noah.


---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ***@postgresql.org)
douglas morrison
2004-05-03 20:38:22 UTC
Permalink
The lack of responses is prolly because this sort of thing is usually
handled by the client... The client for input should be
stripping/converting to ASCII/unicode whichever chars are unwanted and
notifying the user if anything is removed/unusable. The client for
display should then be able to parse the chars correctly...

You might be able to use your current data if you change your SELECT to
something like:

SELECT to_ascii(columnName, 'LATIN1') AS convertedColumn
FROM tableName;


hth,
doug
Post by Noah Davis
I posted this to the pgsql general list, but alas, I did not get any
responses. Perhaps someone here could be of assistance?
I have a database with some bad characters in it -- some users had entered
MS Word smart quotes, em dashes, foreign characters, and they look like
gibberish coming out of the database. Most important are the smart quotes I
guess.
What's the best way to replace these characters? I thought I might be able
to run a simple SQL UPDATE command, but some of the gibberish for different
characters looks the same (at least from my client it does), and it would
clobber them all.
I have a feeling there's some sort of ASCII code or unicode solution to this
problem, but I could use am little push in the right direction.
Thanks.
Noah.
---------------------------(end of
broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend
Noah Davis
2004-05-03 20:55:08 UTC
Permalink
Hi Doug,

Thanks for the response. I do realize it's not the ideal situation, but it's
the database I inherited, so not much I can do there :) . I tried the
to_ascii and it doesn't seem to help much. It may be that I can try a
different client and get more legible characters. Not sure.

-----Original Message-----
From: douglas morrison [mailto:***@comcast.net]
Sent: Monday, May 03, 2004 4:38 PM
To: Noah Davis
Cc: pgsql-***@postgresql.org
Subject: Re: [ADMIN] Bad character data

The lack of responses is prolly because this sort of thing is usually
handled by the client... The client for input should be stripping/converting
to ASCII/unicode whichever chars are unwanted and notifying the user if
anything is removed/unusable. The client for display should then be able to
parse the chars correctly...

You might be able to use your current data if you change your SELECT to
something like:

SELECT to_ascii(columnName, 'LATIN1') AS convertedColumn
FROM tableName;


hth,
doug
Post by Noah Davis
I posted this to the pgsql general list, but alas, I did not get any
responses. Perhaps someone here could be of assistance?
I have a database with some bad characters in it -- some users had
entered MS Word smart quotes, em dashes, foreign characters, and they
look like gibberish coming out of the database. Most important are the
smart quotes I guess.
What's the best way to replace these characters? I thought I might be
able to run a simple SQL UPDATE command, but some of the gibberish for
different characters looks the same (at least from my client it does),
and it would clobber them all.
I have a feeling there's some sort of ASCII code or unicode solution
to this problem, but I could use am little push in the right
direction.
Thanks.
Noah.
---------------------------(end of
broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend
douglas morrison
2004-05-03 23:11:11 UTC
Permalink
no problem noah,

been there inheriting bad data, suXX0r. wish i could be of more help.
if all else fails look at the translate() function to replace the
offending chars... dunno what i can do, but which type of client are
you currently using?

--
doug
Post by Noah Davis
Hi Doug,
Thanks for the response. I do realize it's not the ideal situation, but it's
the database I inherited, so not much I can do there :) . I tried the
to_ascii and it doesn't seem to help much. It may be that I can try a
different client and get more legible characters. Not sure.
-----Original Message-----
Sent: Monday, May 03, 2004 4:38 PM
To: Noah Davis
Subject: Re: [ADMIN] Bad character data
The lack of responses is prolly because this sort of thing is usually
handled by the client... The client for input should be
stripping/converting
to ASCII/unicode whichever chars are unwanted and notifying the user if
anything is removed/unusable. The client for display should then be able to
parse the chars correctly...
You might be able to use your current data if you change your SELECT to
SELECT to_ascii(columnName, 'LATIN1') AS convertedColumn
FROM tableName;
hth,
doug
Post by Noah Davis
I posted this to the pgsql general list, but alas, I did not get any
responses. Perhaps someone here could be of assistance?
I have a database with some bad characters in it -- some users had
entered MS Word smart quotes, em dashes, foreign characters, and they
look like gibberish coming out of the database. Most important are the
smart quotes I guess.
What's the best way to replace these characters? I thought I might be
able to run a simple SQL UPDATE command, but some of the gibberish for
different characters looks the same (at least from my client it does),
and it would clobber them all.
I have a feeling there's some sort of ASCII code or unicode solution
to this problem, but I could use am little push in the right
direction.
Thanks.
Noah.
---------------------------(end of
broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to
---------------------------(end of
broadcast)---------------------------
TIP 8: explain analyze is your friend
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match
Noah Davis
2004-05-04 14:09:19 UTC
Permalink
I'm using psql through putty (an ssh client). Someone reported using
MindTerm as their ssh client and saw the characters ok...? Weird. But they
are definitely screwed up when viewed through the ultimate client (web
browser).

-----Original Message-----
From: pgsql-admin-***@postgresql.org
[mailto:pgsql-admin-***@postgresql.org] On Behalf Of douglas morrison
Sent: Monday, May 03, 2004 7:11 PM
To: Noah Davis
Cc: pgsql-***@postgresql.org
Subject: Re: [ADMIN] Bad character data

no problem noah,

been there inheriting bad data, suXX0r. wish i could be of more help.
if all else fails look at the translate() function to replace the offending
chars... dunno what i can do, but which type of client are you currently
using?

--
doug
Post by Noah Davis
Hi Doug,
Thanks for the response. I do realize it's not the ideal situation,
but it's the database I inherited, so not much I can do there :) . I
tried the to_ascii and it doesn't seem to help much. It may be that I
can try a different client and get more legible characters. Not sure.
-----Original Message-----
Sent: Monday, May 03, 2004 4:38 PM
To: Noah Davis
Subject: Re: [ADMIN] Bad character data
The lack of responses is prolly because this sort of thing is usually
handled by the client... The client for input should be
stripping/converting to ASCII/unicode whichever chars are unwanted and
notifying the user if anything is removed/unusable. The client for
display should then be able to parse the chars correctly...
You might be able to use your current data if you change your SELECT
SELECT to_ascii(columnName, 'LATIN1') AS convertedColumn
FROM tableName;
hth,
doug
Post by Noah Davis
I posted this to the pgsql general list, but alas, I did not get any
responses. Perhaps someone here could be of assistance?
I have a database with some bad characters in it -- some users had
entered MS Word smart quotes, em dashes, foreign characters, and they
look like gibberish coming out of the database. Most important are
the smart quotes I guess.
What's the best way to replace these characters? I thought I might be
able to run a simple SQL UPDATE command, but some of the gibberish
for different characters looks the same (at least from my client it
does), and it would clobber them all.
I have a feeling there's some sort of ASCII code or unicode solution
to this problem, but I could use am little push in the right
direction.
Thanks.
Noah.
---------------------------(end of
broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to
---------------------------(end of
broadcast)---------------------------
TIP 8: explain analyze is your friend
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match


---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html
douglas morrison
2004-05-04 14:33:39 UTC
Permalink
well if that truly is the case, then maybe the different ssh clients
are expecting the chars to be in a specific encoding... The web browser
view can be fixed using php, jsp, coldfusion or whatever your
middleware tier is to translate the data properly.
Post by Noah Davis
I'm using psql through putty (an ssh client). Someone reported using
MindTerm as their ssh client and saw the characters ok...? Weird. But they
are definitely screwed up when viewed through the ultimate client (web
browser).
-----Original Message-----
Sent: Monday, May 03, 2004 7:11 PM
To: Noah Davis
Subject: Re: [ADMIN] Bad character data
no problem noah,
been there inheriting bad data, suXX0r. wish i could be of more help.
if all else fails look at the translate() function to replace the offending
chars... dunno what i can do, but which type of client are you
currently
using?
--
doug
Post by Noah Davis
Hi Doug,
Thanks for the response. I do realize it's not the ideal situation,
but it's the database I inherited, so not much I can do there :) . I
tried the to_ascii and it doesn't seem to help much. It may be that I
can try a different client and get more legible characters. Not sure.
-----Original Message-----
Sent: Monday, May 03, 2004 4:38 PM
To: Noah Davis
Subject: Re: [ADMIN] Bad character data
The lack of responses is prolly because this sort of thing is usually
handled by the client... The client for input should be
stripping/converting to ASCII/unicode whichever chars are unwanted and
notifying the user if anything is removed/unusable. The client for
display should then be able to parse the chars correctly...
You might be able to use your current data if you change your SELECT
SELECT to_ascii(columnName, 'LATIN1') AS convertedColumn
FROM tableName;
hth,
doug
Post by Noah Davis
I posted this to the pgsql general list, but alas, I did not get any
responses. Perhaps someone here could be of assistance?
I have a database with some bad characters in it -- some users had
entered MS Word smart quotes, em dashes, foreign characters, and they
look like gibberish coming out of the database. Most important are
the smart quotes I guess.
What's the best way to replace these characters? I thought I might be
able to run a simple SQL UPDATE command, but some of the gibberish
for different characters looks the same (at least from my client it
does), and it would clobber them all.
I have a feeling there's some sort of ASCII code or unicode solution
to this problem, but I could use am little push in the right
direction.
Thanks.
Noah.
---------------------------(end of
broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to
---------------------------(end of
broadcast)---------------------------
TIP 8: explain analyze is your friend
---------------------------(end of
broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match
---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Loading...