That UTF-8 error
2008-06-06 18:56:07.343985+00 by Dan Lyke 16 comments
Okay, if anyone out there knows a little bit about Perl and character sets, I need some help with that UTF-8 character people have been running into... More in the comments.
2008-06-06 18:56:07.343985+00 by Dan Lyke 16 comments
Okay, if anyone out there knows a little bit about Perl and character sets, I need some help with that UTF-8 character people have been running into... More in the comments.
[ related topics: Flutterby Meta Perl Open Source ]
comments in ascending chronological order (reverse):
#Comment Re: made: 2008-06-06 19:05:04.556442+00 by: Dan Lyke
Hang on, testing to make sure it's not something simple and stupid...
#Comment Re: made: 2008-06-06 19:06:56.419288+00 by: Dan Lyke [edit history]
A Euro symbol:
Adding a `
#Comment Re: made: 2008-06-06 19:07:33.189242+00 by: ziffle
Dan, its about two US dollars.
#Comment Re: made: 2008-06-06 19:14:36.40769+00 by: ebradway [edit history]
`try this' - hmmm...
#Comment Re: made: 2008-06-06 19:16:00.740395+00 by: ebradway
Yep. It's the grave (below the tilde). Right now it's only when you edit an existing comment - not when you post a new comment.
#Comment Re: made: 2008-06-06 19:18:46.755553+00 by: ebradway [edit history]
`` I think you got it...
#Comment Re: made: 2008-06-06 19:22:29.670531+00 by: Dan Lyke [edit history]
So depending on the client character set, the code looks like:
while ($t =~ /^(.*?)([\x80-\x{ffff}])(.*)$/) { $t = sprintf("%s&#%d;%s",$1,ord($2),$3); }
or
while ($t =~ /^(.*?)([\x80-\xff])(.*)$/) { $t = sprintf("%s&#%d;%s",$1,ord($2),$3); }
I've even tossed the second in the first code path, thinking that at worst it'd give me bad characters, but neither seems to be substituting correctly.
#Comment Re: made: 2008-06-06 19:48:02.415191+00 by: spl
I'm glad I get paid in right now! :)
#Comment Re: made: 2008-06-06 19:48:54.109036+00 by: spl
xcellent!#Comment Re: made: 2008-06-06 19:49:33.155419+00 by: spl
As you can see by my comment spam, it worked in both Text and HTML modes for me.
#Comment Re: made: 2008-06-06 20:05:10.910763+00 by: ziffle [edit history]
#Comment Re: made: 2008-06-06 20:05:33.508416+00 by: ziffle
off topic, but giving burning man a run for the money, err Euro:
http://www.the-twelfth.org.uk/images/S1010010.JPG
#Comment Re: made: 2008-06-16 20:31:47.700346+00 by: ebradway [edit history]
Test:
“We don’t want to cast a pall over the blogosphere by being heavy-handed, so we have to figure out a better and more positive way to do this,” Mr. Kennedy said.
Getting: ERROR: invalid byte sequence for encoding "UTF8": 0x93
Single and double quotes in UTF-8 work when converted to � tags.
#Comment Re: made: 2008-06-16 20:43:24.778923+00 by: Dan Lyke
Yeah, the portion of the code that's not working is supposed to do exactly that.
#Comment Re: made: 2008-06-16 23:38:40.49285+00 by: Dan Lyke
Further testing:
We dont want to cast a pall over the blogosphere by being heavy-handed, so we have to figure out a better and more positive way to do this, Mr. Kennedy said.
#Comment Re: made: 2008-06-16 23:43:43.54768+00 by: Dan Lyke
Some random quotes to see if this is licked:
They will attempt to define clear standards as to how much of its articles and broadcasts bloggers and Web sites can excerpt without infringing on The A.P.s copyright.
Calling it probably the greatest tournament Ive ever had, Tiger Woods
I had to walk to school not too far, a couple miles.