Flutterby™! : Charset weirdness

Next unread comment / Catchup all unread comments User Account Info | Logout | XML/Pilot/etc versions | Long version (with comments) | Weblog archives | Site Map | | Browse Topics

Charset weirdness

2007-05-11 19:40:58.459458+02 by Dan Lyke 7 comments

A couple of you may have noticed issues with UTF-8 versus ISO-8859-1 character set issues since the move to the new server. I just went into the code to see if I could hack around this, and discovered that I should be dealing with them already.

Anyone know how the submitted form's character set should be sent to CGI.pm?

[ related topics: Web development ]

comments in ascending chronological order (reverse):

#Comment Re: made: 2007-05-11 19:41:16.963589+02 by: Dan Lyke [edit history]

#Comment Re: made: 2007-05-11 19:44:16.832261+02 by: Dan Lyke

Crap, okay, that should have failed.

#Comment Re: made: 2007-05-11 21:22:39.368717+02 by: spc476

I don't recall it ever being sent (at least by Apache). However, you can do:

<FORM METHOD="post" ACTION="blah.cgi" ACCEPT-CHARSET="US-ASCII"> and hope the browser follows the hint. That's about the only way I know to do it.

#Comment Re: Meta Tag made: 2007-05-12 02:39:54.747917+02 by: Roger

You can also force the issue by setting the appropriate META tag in the header:

[meta http-equiv="Content-Type" content="text/html; charset=utf-8" /]

#Comment Re: made: 2007-05-12 15:53:29.355655+02 by: Dan Lyke

Roger, is that supposed to force the submit character set? My experience is that it doesn't necessarily.

And there's something weird about how Perl (especially in conjunction with mod_perl) is handling this stuff, at least in the case of my code.

I think I've got it working for comments, still need to make it work for the front page.

#Comment Re: meta + forms made: 2007-05-17 23:12:22.107524+02 by: Roger

In my experiments, that "meta" tag has worked wonders in accepting text copied from any number of sources (websites in shift-jis, text from Word) and sending form data that is utf-8 encoded to my application.

My app's in Python, which may be a factor. I've noticed that trying to pull UTF-8 out of MySQL with Python and use it to PHP via XML-RPC is just a nightmare (well, and XML/XSLT via Popoon is in this mix, which likely does not help!)

#Comment Re: made: 2007-05-18 00:01:34.553537+02 by: Dan Lyke

Yeah, my problem is that I'm trying to detect anything that doesn't look like UTF-8 and convert it to standard ampersand escaped entity encoding, and for some reason that isn't happening all the time. Come to think of it, it may be an IE[Wiki] thing... Hmmm... Might have to fire up the Windows box, but I *really* don't want to be mucking with web apps right now.

Comment policy

We will not edit your comments. However, we may delete your comments, or cause them to be hidden behind another link, if we feel they detract from the conversation. Commercial plugs are fine, if they are relevant to the conversation, and if you don't try to pretend to be a consumer. Annoying endorsements will be deleted if you're lucky, if you're not a whole bunch of people smarter and more articulate than you will ridicule you, and we will leave such ridicule in place.


Flutterby™ is a trademark claimed by

Dan Lyke
for the web publications at www.flutterby.com and www.flutterby.net.