• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

Problem with UTF-8 encoding

Community Beginner ,
Jan 18, 2007 Jan 18, 2007

Copy link to clipboard

Copied

The problem is that although I have finally gotten the static text to display right the dynamitic which is queried from a mySQL database is not being displayed correctly,

I have set checked the database the Spanish, French, and other translations for the contents are there with the correct lettering. I have updated the mySQL drivers to 5.0 as recommended by Adobe, I have placed in the URL string of the JDBC the ?useUnicode=true&characterEncoding=UTF-8 as suggested by another forum. I have even checked all the pages properties to make sure that they are in UTF-8 encoding format, below is a sample of the code I am using what is wrong with the code, or what do I need to change to fix this problem. You may check the site at www.scoringag.com and try the languages translations to see further examples of the problem.

We are using MX7 MySQL4.1 Jconnect5.0

Sample code below:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" " http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns=" http://www.w3.org/1999/xhtml">
<head>

<!---
**
* CF MX Admin "Application.cfm" file
* This file establishes the cfadmin application, as well as creates handles
* to the services using the factory via CFOBJECT.
*
* Copyright (c) 2001 Macromedia. All Rights Reserved.
* DO NOT REDISTRIBUTE THIS SOFTWARE IN ANY WAY WITHOUT THE EXPRESSED
* WRITTEN PERMISSION OF MACROMEDIA.
--->

<!--- Set multi-language utf-8 values here
---------------------------------------------------------------------->
<cfprocessingdirective pageencoding="utf-8">

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<cfset URLenChar = "utf-8" >

<!--- Set encoding to utf-8. --->
<cfset setEncoding("URL", "utf-8")>
<cfset setEncoding("Form", "utf-8")>

<cfparam name="url.login" default="">

<!--- Set the output encoding to utf-8 --->
<cfcontent type="text/html; charset=utf-8">

</head>

<cfset SESSION.locale='es'>

<!--- <div id="home_contents"> --->
<style type="text/css">
<!--
.style2 {color: #ff0000}
-->
</style>

<div id="content">
<table align="center" width="100%">

<tr><center>

<div align="center" style="width:100%; font-size:13px; font-weight:500; color:#000000; "><br />
<a href=" http://www.cfsan.fda.gov/~dms/fsbtac23.html" target="_blank" class="style2" >*** Important Information (please read)! ***<br />
FDA Fact Sheet ScoringAg has the Solution! </a><br />
<a href="Public/docs/Acciones de la FDA en la nueva legislacion del Bioterrorismo.pdf" target="_blank" class="style2">Haga clic para aquí ver
los Hechos de los USA FDA - en Español</a> <br />
<br />
<cfscript>ssite.translate('#SESSION.Locale#', 1, 111);</cfscript></div><br />

TOPICS
Advanced techniques

Views

2.3K

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines

correct answers 1 Correct answer

Community Beginner , Jan 19, 2007 Jan 19, 2007
And while you two are debating the issue I removed the Dateformat tag in the copyright clause at the bottom of the page, problem fixed. Don't ask why, I don't know but it works now, go figure, now I move to my next problem, real time video feed of a cow walking, don't ask I just do, just do :)

David Gamache

Votes

Translate

Translate
Engaged ,
Jan 18, 2007 Jan 18, 2007

Copy link to clipboard

Copied

could the problem be in the ssite.translate function? can you post the code for it? how does it use the Session.locale?
another thing to check would be if the spanish text in the db is stored correctly. could be you are seing exactly what is stored in the db...
one other thing to try: add &characterSetResults=UTF-8 to the end of the JDBC URL in CF Admin. I have just finished a website in Lao language with CFMX7 and MySQL, and that little line has made it possible...

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Engaged ,
Jan 18, 2007 Jan 18, 2007

Copy link to clipboard

Copied

one other thing you can do with your code is move your
<!--- Set encoding to utf-8. --->
<cfset setEncoding("URL", "utf-8")>
<cfset setEncoding("Form", "utf-8")>

as well as

<!--- Set the output encoding to utf-8 --->
<cfcontent type="text/html; charset=utf-8">

into your Application.cfm file. You will then not need to have that in any other pages, only <cfprocessingdirective pageencoding="utf-8">. And remember yo incude that line as first line of EVERY page, even if a page is only used through <cfinclude>!!!

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Jan 18, 2007 Jan 18, 2007

Copy link to clipboard

Copied

Sabaidee: here is the the ssite.translate function, the Session.locale is used to lookup the locale and return the translation of the text for example locale is es will return spanish text for all d-text in the site. as for the <!--- Set encoding to utf-8. --->
<cfset setEncoding("URL", "utf-8")>
<cfset setEncoding("Form", "utf-8")> I placed them in that page just to see if it will work, hey sometimes you have to try something stupid to get to the correct answer.

Dags

<cfcomponent>

<cfprocessingdirective pageencoding="utf-8">

<cffunction name="translate" returntype="string" output = "no">
<cfargument name="locale" default="en">
<cfargument name="t_id">
<cfargument name="page_id">

<!---<cftry>--->
<cfquery name="translations" datasource="#request.tracebackDSN#">
SELECT #locale# as locale
FROM translations
WHERE 0 = 0
AND translation_id = #t_id#
AND page_id = #page_id#
</cfquery>
<!---<cfcatch type="any">Translation Error</cfcatch>
</cftry>--->

<!--- Remove any leading and trailing spaces SM@05/13/2005 --->
<cfset tValue = #translations.locale#>
<cfset tValue = trim(tValue)>

<cfreturn tValue>
</cffunction>

<cffunction name="btranslate">
<cfargument name="locale" default="en">
<cfargument name="t_id">
<cfargument name="page_id">

<cftry>
<cfquery name="translations" datasource="#request.tracebackDSN#">
SELECT #locale# as locale
FROM translations
WHERE 0 = 0
AND translation_id = #t_id#
AND page_id = #page_id#
</cfquery>
<cfcatch type="any">Translation Error</cfcatch>
</cftry>

<!--- Remove any leading and trailing spaces SM@05/13/2005 --->
<cfset tValue = #translations.locale#>
<cfset tValue = trim(tValue)>

<cfreturn trim(tValue)>
</cffunction>
</cfcomponent>

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Engaged ,
Jan 18, 2007 Jan 18, 2007

Copy link to clipboard

Copied

you don't have to tell me about trying something stupid to get it right: so many of those things i have tried in order to make my Lao language site work...

anyway, the translate function does not seem to have anything to do with it, so back to the other checks:
- did you try adding &characterSetResults=UTF-8 to the JDBC url?
- did you check that the data stored in the db is actually correct? (use something like phpMyAdmin or similar to access the db and browse the tables to see the contents of the fields)

also, check what are the collation and character encoding settings in your db.

is the Spanish text displayed the same on your development machine or is it displayed correctly? in my case with Lao language everything worked fine on my comp, but on the server it all got screwed up, and the problem was solved by upgrading the server's CF MySql drivers to the latest version. oh, i see you are using connector/J 5.0... so should not be a problem... not sure about MySql 4... i am using 5...

try the above 2 suggestions and let me know if they helped.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Jan 18, 2007 Jan 18, 2007

Copy link to clipboard

Copied

I am waiting for the Server company to reboot the Coldfusion server to see if the &characterSetResults=UTF-8 works and I have looked into the database and everything looks correct but then I don't know Spanish or German so I am not sure, I should know more in about 20 minutes.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Jan 18, 2007 Jan 18, 2007

Copy link to clipboard

Copied

dagamache wrote:
> of the JDBC the ?useUnicode=true&characterEncoding=UTF-8 as suggested by
> another forum. I have even checked all the pages properties to make sure that
> they are in UTF-8 encoding format, below is a sample of the code I am using

and is your mySQL database's encoding actually utf-8?

also viewing data via the db's tools are often misleading when it comes to
encoding, what looks ok in the tool might end up as mojibake or garbage when
passed thru a JDBC driver.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Jan 18, 2007 Jan 18, 2007

Copy link to clipboard

Copied

I have checked both the database from the admin tools and from looking into it directly and everything looks fine in the tools just the JDBC returns are messed up

David Gamache

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Jan 18, 2007 Jan 18, 2007

Copy link to clipboard

Copied

dagamache wrote:
> I have checked both the database from the admin tools and from looking into
> it directly and everything looks fine in the tools just the JDBC returns are
> messed up

i guess you missed the part where i said *not* to trust the db tools for this?

is the db encoding actually utf-8? was the data actually entered as utf-8? how
was the data entered? can you dump out the data as a simple text file?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Jan 18, 2007 Jan 18, 2007

Copy link to clipboard

Copied

ok that did not work, as for the I have added the characterSetResults = UTF-8 and the collation and character encoding is UTF8_unicode_ci for all languages that we have translations for.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Engaged ,
Jan 18, 2007 Jan 18, 2007

Copy link to clipboard

Copied

hmm.... makes me wonder if it has anything to do with the text from db being returned through a cfc... i am not sure what default encoding/character set is used by CF in that case and how to change it.
does PaulH know, being an Adobe Community Expert? i am trying to find it in the docs and google for it now...

i would start by moving the <cfprocessingdirective...> line to the top of the cfc, before the <cfcompoment> tag....

i would also try another "stupid" thing and change <cfset setEncoding(...> for both form and url scopes in your Application.cfm to:
<cfscript>
setEncoding("form", "utf-8");
setEncoding("url", "utf-8");
</cfscript>

and remove them from other pages.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Jan 18, 2007 Jan 18, 2007

Copy link to clipboard

Copied

Sabaidee I have been googling, reading and looking at docs for two days now, this one has me stumped, it does not make any sence, but that is what I live for, anytime you find these problems and got to go beat your head on a wall :)

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Jan 18, 2007 Jan 18, 2007

Copy link to clipboard

Copied

Invalid CFML construct found on line 3 at column 1.

is the error you get when you place the cfprocessingdirective tag before the cfcompoment tag, but hey at this point maybe beating my head on the wall will work

David Gamache

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Jan 18, 2007 Jan 18, 2007

Copy link to clipboard

Copied

> Invalid CFML construct found on line 3 at column 1.
>
> is the error you get when you place the cfprocessingdirective tag before the
> cfcompoment tag, but hey at this point maybe beating my head on the wall will
> work

You only need <cfprocessingdirective> if the FILE ITSELF has UTF-8
characters in it. You DO NOT need it if it's simply processing UTF-8 data.

I think you need to go back to basics, and do a bit of unit testing. Write
a CF template that simply has a <cfquery> which pulls some of your data
back from the DB, and outputs it on the screen. Does that work?

(remember to have the <meta> tag and the <cfcontent> tag too).

--
Adam

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Jan 18, 2007 Jan 18, 2007

Copy link to clipboard

Copied

One of the first things I did was use the following code to see if a simple page could display the translations and you see the results below, I am at my wits end here, folks someone out there surly knows what is wrong and can tell me how to fix it.

David Gamache
<cfprocessingdirective pageencoding="UTF-8">
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" " http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns=" http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

<cfcontent type="text/html; charset=UTF-8">
<cfscript>
SetEncoding("form", "utf-8");
SetEncoding("url", "utf-8");
</cfscript>
<title>Untitled Document</title>
</head>

<body>
<cfquery name="translations2" datasource="#request.tracebackDSN#">
SELECT *
FROM translations
WHERE 0 = 0
AND translation_id = '1111'
AND page_id = '1'
</cfquery>
<cfoutput query="translations2">#translations2.es#
<br />
HELLO<br />
#SESSION.locale#
</cfoutput>
</body>
</html>

Establezca y Mantenga que sus Registros en ScoringAg f�cil y econ�mico Protegen la Provisi�
HELLO
es

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Jan 18, 2007 Jan 18, 2007

Copy link to clipboard

Copied

I did not review all the previous suggestions, so parden me if this has
allready been tried and failed. But then I set the encoding on pages I
do so as follows.


<cfprocessingdirective pageencoding="UTF-8">

<cfscript>// note: only needed if you are submitting and/or receiving
form values, but does not hurt to always have available.
SetEncoding("form", "utf-8");
SetEncoding("url", "utf-8");
</cfscript>

<!--- Note the reset parameter in the cfcontent tag, this clears
anything that has already been generated for this response. I put it on
the same line as the doctype so there are no extra white space and lines
that can throw off some IE browsers versions. --->

<cfcontent type="text/html; charset=UTF-8" reset="yes"><!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
" http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns=" http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

<title>Untitled Document</title>
</head>

<body>

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Jan 18, 2007 Jan 18, 2007

Copy link to clipboard

Copied

Adam Cameron wrote:
> You only need <cfprocessingdirective> if the FILE ITSELF has UTF-8
> characters in it. You DO NOT need it if it's simply processing UTF-8 data.

actually it's good practice to use <cfprocessingdirective> as the BOM is
optional for utf-8.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Jan 18, 2007 Jan 18, 2007

Copy link to clipboard

Copied

Sabaidee wrote:
> hmm.... makes me wonder if it has anything to do with the text from db being
> returned through a cfc... i am not sure what default encoding/character set is
> used by CF in that case and how to change it.

for cfmx (cf6 & above) it's utf-8, that should be common knowledge. for cf5 &
older versions it's supposed to be latin-1 but cf never really paid much
attention to encoding in those versions.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Jan 19, 2007 Jan 19, 2007

Copy link to clipboard

Copied

> actually it's good practice to use <cfprocessingdirective> as the BOM is
> optional for utf-8.

Which would only be relevant IF the file contained UTF-8 data. Like I
said.

You can't tell me it's "good practice" to include that tag on EVERY FILE in
an application, "just in case". Because that would only lead me to ask why
- if it's such good practice - it is that CF cannot work out for itself
that the file has UTF-8 content(*), and why it's up to the developer to
tell it. You can't have it both ways.

Do YOU, Paul, put <cfprocessingdirective> at the top of ALL your files?

Anyway, "just in case" scenarios should not apply to source code, should
it? The developer will (well: SHOULD) know whether their templates have
UTF-8 data within it.

--
Adam

(*) Especially when the file DOES have a UTF-8 BOM.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Jan 19, 2007 Jan 19, 2007

Copy link to clipboard

Copied

Adam Cameron wrote:
> You can't tell me it's "good practice" to include that tag on EVERY FILE in
> an application, "just in case". Because that would only lead me to ask why

actually that's exactly what i'm telling you. unless you have 100% perfect
control over all your cf pages, all the time, somebody can come along & edit them.

> - if it's such good practice - it is that CF cannot work out for itself
> that the file has UTF-8 content(*), and why it's up to the developer to
> tell it. You can't have it both ways.

once again, the BOM is optional.

> Do YOU, Paul, put <cfprocessingdirective> at the top of ALL your files?

for real work, pretty much so, those are my good practices. i do admit to
knocking tests/demos of without it.

> Anyway, "just in case" scenarios should not apply to source code, should
> it? The developer will (well: SHOULD) know whether their templates have
> UTF-8 data within it.

see above.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Jan 19, 2007 Jan 19, 2007

Copy link to clipboard

Copied

>actually that's exactly what i'm telling you. unless you have 100% perfect control over all your cf pages, all the time, somebody can come along & edit them.

Well it's either me or someone on my team. All of which are developers,
rather than gibbons, so ought to know what they're doing.

We can agree to disagree, which is fine, but I think your practice is a
poor one. It's far more useful to leave the simple ASCII files alone, and
IFF a file has UTF-8 content in it, for whatever reason, THEN mark it
accordingly. It is then a flag to anyone reviewing it that it's there,
like a warning "yes, I meant it to be like this, there is a reason, HEED".


>> - if it's such good practice - it is that CF cannot work out for itself
>> that the file has UTF-8 content(*), and why it's up to the developer to
>> tell it. You can't have it both ways.
>
> once again, the BOM is optional.

Sure. Which imples that it's not adequate to rely on it being there. So
the responsibility falls onto the application reading the file to determine
whether the content is UTF-8 or not. If NOTEPAD can manage it, I puzzle as
to why CF cannot, and relies on people like you to put
<cfprocessingdirective> at the top of every template.

--
Adam

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Jan 19, 2007 Jan 19, 2007

Copy link to clipboard

Copied

Adam Cameron wrote:
> Well it's either me or someone on my team. All of which are developers,
> rather than gibbons, so ought to know what they're doing.

you assume too much.

> We can agree to disagree, which is fine, but I think your practice is a poor
> one. It's far more useful to leave the simple ASCII files alone, and IFF a
> file has UTF-8 content in it, for whatever reason, THEN mark it accordingly.

again, the BOM is optional.

> Sure. Which imples that it's not adequate to rely on it being there. So the
> responsibility falls onto the application reading the file to determine
> whether the content is UTF-8 or not. If NOTEPAD can manage it, I puzzle as
> to why CF cannot, and relies on people like you to put
> <cfprocessingdirective> at the top of every template.

only if the BOM isn't there--once again it's optional.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Jan 19, 2007 Jan 19, 2007

Copy link to clipboard

Copied

> again, the BOM is optional.

I think we could be talking @ cross-purposes. Either that or one or both
of us is being dense.

If I create a NEW text file in notepad.exe, it defaults to ANSI. If I then
insert into that file UTF-8 content, notepad.exe NOTICES this, and when I
go to save it as ANSI (no BOM), says "well... you better not... you'll
mangle your data". So notepad.exe can tell when a file hass UTF-8 content
WITHOUT the BOM being there. As it should. Like you said.

As you say, the BOM is entirely optional. So an application needs to use
*some other mechanism* to detect if it should be parsing as plain old ASCII
text, or whether it needs to treat it as UTF-8.

If notepad.exe can do this without a special <cfprocessingdirective-like>
tag, or a BOM, then blimin' CF should be capable of doing it too. one
certainly should NOT have to MANUALLY advise CF - in EVERY FILE - what it
should be doing. Bloody ridiculous.

--
Adam

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Jan 19, 2007 Jan 19, 2007

Copy link to clipboard

Copied

>> Well it's either me or someone on my team. All of which are developers,
>> rather than gibbons, so ought to know what they're doing.
>
> you assume too much.

I would rather pick up the problem and deal with it (by instructing the
miscreant of the ins and outs of UTF-8 and CF's incapabilities in that
regard), than have a sledge-hammer/walnut approach such as yours.

There's also the fact that in over 3000 CF templates (>10MB of raw
character data) in our (multi-lingual, I might add) software, there is not
yet one instance of there being UTF-8 data being present on a CF template.
Which kinda puts into perspective how sensible - in my mind - it is to
globally "deal with" a situation that is in fact not that common. Of
course our s/w is not statistically representative of everyone's situation,
but it's some sort of measure.

But go your hardest... I'm not trying to convince you to do anything other
than what already makes you happy. I *am* perhaps trying to offer an
alternative position to your opinion it's a "good practice", though, I
guess.

--
Adam

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Jan 19, 2007 Jan 19, 2007

Copy link to clipboard

Copied

And while you two are debating the issue I removed the Dateformat tag in the copyright clause at the bottom of the page, problem fixed. Don't ask why, I don't know but it works now, go figure, now I move to my next problem, real time video feed of a cow walking, don't ask I just do, just do :)

David Gamache

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources
Documentation