28 Replies Latest reply on Jan 20, 2007 11:52 PM by BKBK

    Problem with UTF-8 encoding

    dagamache
      The problem is that although I have finally gotten the static text to display right the dynamitic which is queried from a mySQL database is not being displayed correctly,

      I have set checked the database the Spanish, French, and other translations for the contents are there with the correct lettering. I have updated the mySQL drivers to 5.0 as recommended by Adobe, I have placed in the URL string of the JDBC the ?useUnicode=true&characterEncoding=UTF-8 as suggested by another forum. I have even checked all the pages properties to make sure that they are in UTF-8 encoding format, below is a sample of the code I am using what is wrong with the code, or what do I need to change to fix this problem. You may check the site at www.scoringag.com and try the languages translations to see further examples of the problem.

      We are using MX7 MySQL4.1 Jconnect5.0

      Sample code below:
      <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" " http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
      <html xmlns=" http://www.w3.org/1999/xhtml">
      <head>

      <!---
      **
      * CF MX Admin "Application.cfm" file
      * This file establishes the cfadmin application, as well as creates handles
      * to the services using the factory via CFOBJECT.
      *
      * Copyright (c) 2001 Macromedia. All Rights Reserved.
      * DO NOT REDISTRIBUTE THIS SOFTWARE IN ANY WAY WITHOUT THE EXPRESSED
      * WRITTEN PERMISSION OF MACROMEDIA.
      --->

      <!--- Set multi-language utf-8 values here
      ---------------------------------------------------------------------->
      <cfprocessingdirective pageencoding="utf-8">

      <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
      <cfset URLenChar = "utf-8" >

      <!--- Set encoding to utf-8. --->
      <cfset setEncoding("URL", "utf-8")>
      <cfset setEncoding("Form", "utf-8")>

      <cfparam name="url.login" default="">

      <!--- Set the output encoding to utf-8 --->
      <cfcontent type="text/html; charset=utf-8">

      </head>

      <cfset SESSION.locale='es'>

      <!--- <div id="home_contents"> --->
      <style type="text/css">
      <!--
      .style2 {color: #ff0000}
      -->
      </style>

      <div id="content">
      <table align="center" width="100%">

      <tr><center>

      <div align="center" style="width:100%; font-size:13px; font-weight:500; color:#000000; "><br />
      <a href=" http://www.cfsan.fda.gov/~dms/fsbtac23.html" target="_blank" class="style2" >*** Important Information (please read)! ***<br />
      FDA Fact Sheet ScoringAg has the Solution! </a><br />
      <a href="Public/docs/Acciones de la FDA en la nueva legislacion del Bioterrorismo.pdf" target="_blank" class="style2">Haga clic para aquí ver
      los Hechos de los USA FDA - en Español</a> <br />
      <br />
      <cfscript>ssite.translate('#SESSION.Locale#', 1, 111);</cfscript></div><br />

        • 1. Re: Problem with UTF-8 encoding
          azadisaryev Level 1
          could the problem be in the ssite.translate function? can you post the code for it? how does it use the Session.locale?
          another thing to check would be if the spanish text in the db is stored correctly. could be you are seing exactly what is stored in the db...
          one other thing to try: add &characterSetResults=UTF-8 to the end of the JDBC URL in CF Admin. I have just finished a website in Lao language with CFMX7 and MySQL, and that little line has made it possible...
          • 2. Re: Problem with UTF-8 encoding
            azadisaryev Level 1
            one other thing you can do with your code is move your
            <!--- Set encoding to utf-8. --->
            <cfset setEncoding("URL", "utf-8")>
            <cfset setEncoding("Form", "utf-8")>

            as well as

            <!--- Set the output encoding to utf-8 --->
            <cfcontent type="text/html; charset=utf-8">

            into your Application.cfm file. You will then not need to have that in any other pages, only <cfprocessingdirective pageencoding="utf-8">. And remember yo incude that line as first line of EVERY page, even if a page is only used through <cfinclude>!!!

            • 3. Re: Problem with UTF-8 encoding
              dagamache Level 1
              Sabaidee: here is the the ssite.translate function, the Session.locale is used to lookup the locale and return the translation of the text for example locale is es will return spanish text for all d-text in the site. as for the <!--- Set encoding to utf-8. --->
              <cfset setEncoding("URL", "utf-8")>
              <cfset setEncoding("Form", "utf-8")> I placed them in that page just to see if it will work, hey sometimes you have to try something stupid to get to the correct answer.

              Dags

              <cfcomponent>

              <cfprocessingdirective pageencoding="utf-8">

              <cffunction name="translate" returntype="string" output = "no">
              <cfargument name="locale" default="en">
              <cfargument name="t_id">
              <cfargument name="page_id">

              <!---<cftry>--->
              <cfquery name="translations" datasource="#request.tracebackDSN#">
              SELECT #locale# as locale
              FROM translations
              WHERE 0 = 0
              AND translation_id = #t_id#
              AND page_id = #page_id#
              </cfquery>
              <!---<cfcatch type="any">Translation Error</cfcatch>
              </cftry>--->

              <!--- Remove any leading and trailing spaces SM@05/13/2005 --->
              <cfset tValue = #translations.locale#>
              <cfset tValue = trim(tValue)>

              <cfreturn tValue>
              </cffunction>

              <cffunction name="btranslate">
              <cfargument name="locale" default="en">
              <cfargument name="t_id">
              <cfargument name="page_id">

              <cftry>
              <cfquery name="translations" datasource="#request.tracebackDSN#">
              SELECT #locale# as locale
              FROM translations
              WHERE 0 = 0
              AND translation_id = #t_id#
              AND page_id = #page_id#
              </cfquery>
              <cfcatch type="any">Translation Error</cfcatch>
              </cftry>

              <!--- Remove any leading and trailing spaces SM@05/13/2005 --->
              <cfset tValue = #translations.locale#>
              <cfset tValue = trim(tValue)>

              <cfreturn trim(tValue)>
              </cffunction>
              </cfcomponent>
              • 4. Re: Problem with UTF-8 encoding
                azadisaryev Level 1
                you don't have to tell me about trying something stupid to get it right: so many of those things i have tried in order to make my Lao language site work...

                anyway, the translate function does not seem to have anything to do with it, so back to the other checks:
                - did you try adding &characterSetResults=UTF-8 to the JDBC url?
                - did you check that the data stored in the db is actually correct? (use something like phpMyAdmin or similar to access the db and browse the tables to see the contents of the fields)

                also, check what are the collation and character encoding settings in your db.

                is the Spanish text displayed the same on your development machine or is it displayed correctly? in my case with Lao language everything worked fine on my comp, but on the server it all got screwed up, and the problem was solved by upgrading the server's CF MySql drivers to the latest version. oh, i see you are using connector/J 5.0... so should not be a problem... not sure about MySql 4... i am using 5...

                try the above 2 suggestions and let me know if they helped.
                • 5. Re: Problem with UTF-8 encoding
                  dagamache Level 1
                  I am waiting for the Server company to reboot the Coldfusion server to see if the &characterSetResults=UTF-8 works and I have looked into the database and everything looks correct but then I don't know Spanish or German so I am not sure, I should know more in about 20 minutes.
                  • 6. Re: Problem with UTF-8 encoding
                    Level 7
                    dagamache wrote:
                    > of the JDBC the ?useUnicode=true&characterEncoding=UTF-8 as suggested by
                    > another forum. I have even checked all the pages properties to make sure that
                    > they are in UTF-8 encoding format, below is a sample of the code I am using

                    and is your mySQL database's encoding actually utf-8?

                    also viewing data via the db's tools are often misleading when it comes to
                    encoding, what looks ok in the tool might end up as mojibake or garbage when
                    passed thru a JDBC driver.

                    • 7. Re: Problem with UTF-8 encoding
                      dagamache Level 1
                      ok that did not work, as for the I have added the characterSetResults = UTF-8 and the collation and character encoding is UTF8_unicode_ci for all languages that we have translations for.
                      • 8. Re: Problem with UTF-8 encoding
                        dagamache Level 1
                        I have checked both the database from the admin tools and from looking into it directly and everything looks fine in the tools just the JDBC returns are messed up

                        David Gamache
                        • 9. Re: Problem with UTF-8 encoding
                          azadisaryev Level 1
                          hmm.... makes me wonder if it has anything to do with the text from db being returned through a cfc... i am not sure what default encoding/character set is used by CF in that case and how to change it.
                          does PaulH know, being an Adobe Community Expert? i am trying to find it in the docs and google for it now...

                          i would start by moving the <cfprocessingdirective...> line to the top of the cfc, before the <cfcompoment> tag....

                          i would also try another "stupid" thing and change <cfset setEncoding(...> for both form and url scopes in your Application.cfm to:
                          <cfscript>
                          setEncoding("form", "utf-8");
                          setEncoding("url", "utf-8");
                          </cfscript>

                          and remove them from other pages.
                          • 10. Re: Problem with UTF-8 encoding
                            dagamache Level 1
                            Sabaidee I have been googling, reading and looking at docs for two days now, this one has me stumped, it does not make any sence, but that is what I live for, anytime you find these problems and got to go beat your head on a wall :)

                            • 11. Re: Problem with UTF-8 encoding
                              dagamache Level 1
                              Invalid CFML construct found on line 3 at column 1.

                              is the error you get when you place the cfprocessingdirective tag before the cfcompoment tag, but hey at this point maybe beating my head on the wall will work

                              David Gamache
                              • 12. Re: Problem with UTF-8 encoding
                                Level 7
                                > Invalid CFML construct found on line 3 at column 1.
                                >
                                > is the error you get when you place the cfprocessingdirective tag before the
                                > cfcompoment tag, but hey at this point maybe beating my head on the wall will
                                > work

                                You only need <cfprocessingdirective> if the FILE ITSELF has UTF-8
                                characters in it. You DO NOT need it if it's simply processing UTF-8 data.

                                I think you need to go back to basics, and do a bit of unit testing. Write
                                a CF template that simply has a <cfquery> which pulls some of your data
                                back from the DB, and outputs it on the screen. Does that work?

                                (remember to have the <meta> tag and the <cfcontent> tag too).

                                --
                                Adam
                                • 13. Re: Problem with UTF-8 encoding
                                  dagamache Level 1
                                  One of the first things I did was use the following code to see if a simple page could display the translations and you see the results below, I am at my wits end here, folks someone out there surly knows what is wrong and can tell me how to fix it.

                                  David Gamache
                                  <cfprocessingdirective pageencoding="UTF-8">
                                  <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" " http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
                                  <html xmlns=" http://www.w3.org/1999/xhtml">
                                  <head>
                                  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

                                  <cfcontent type="text/html; charset=UTF-8">
                                  <cfscript>
                                  SetEncoding("form", "utf-8");
                                  SetEncoding("url", "utf-8");
                                  </cfscript>
                                  <title>Untitled Document</title>
                                  </head>

                                  <body>
                                  <cfquery name="translations2" datasource="#request.tracebackDSN#">
                                  SELECT *
                                  FROM translations
                                  WHERE 0 = 0
                                  AND translation_id = '1111'
                                  AND page_id = '1'
                                  </cfquery>
                                  <cfoutput query="translations2">#translations2.es#
                                  <br />
                                  HELLO<br />
                                  #SESSION.locale#
                                  </cfoutput>
                                  </body>
                                  </html>

                                  Establezca y Mantenga que sus Registros en ScoringAg f�cil y econ�mico Protegen la Provisi�
                                  HELLO
                                  es
                                  • 14. Re: Problem with UTF-8 encoding
                                    Level 7
                                    I did not review all the previous suggestions, so parden me if this has
                                    allready been tried and failed. But then I set the encoding on pages I
                                    do so as follows.


                                    <cfprocessingdirective pageencoding="UTF-8">

                                    <cfscript>// note: only needed if you are submitting and/or receiving
                                    form values, but does not hurt to always have available.
                                    SetEncoding("form", "utf-8");
                                    SetEncoding("url", "utf-8");
                                    </cfscript>

                                    <!--- Note the reset parameter in the cfcontent tag, this clears
                                    anything that has already been generated for this response. I put it on
                                    the same line as the doctype so there are no extra white space and lines
                                    that can throw off some IE browsers versions. --->

                                    <cfcontent type="text/html; charset=UTF-8" reset="yes"><!DOCTYPE html
                                    PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
                                    " http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
                                    <html xmlns=" http://www.w3.org/1999/xhtml">
                                    <head>
                                    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

                                    <title>Untitled Document</title>
                                    </head>

                                    <body>
                                    • 15. Re: Problem with UTF-8 encoding
                                      Level 7
                                      dagamache wrote:
                                      > I have checked both the database from the admin tools and from looking into
                                      > it directly and everything looks fine in the tools just the JDBC returns are
                                      > messed up

                                      i guess you missed the part where i said *not* to trust the db tools for this?

                                      is the db encoding actually utf-8? was the data actually entered as utf-8? how
                                      was the data entered? can you dump out the data as a simple text file?
                                      • 16. Re: Problem with UTF-8 encoding
                                        Level 7
                                        Adam Cameron wrote:
                                        > You only need <cfprocessingdirective> if the FILE ITSELF has UTF-8
                                        > characters in it. You DO NOT need it if it's simply processing UTF-8 data.

                                        actually it's good practice to use <cfprocessingdirective> as the BOM is
                                        optional for utf-8.
                                        • 17. Re: Problem with UTF-8 encoding
                                          Level 7
                                          Sabaidee wrote:
                                          > hmm.... makes me wonder if it has anything to do with the text from db being
                                          > returned through a cfc... i am not sure what default encoding/character set is
                                          > used by CF in that case and how to change it.

                                          for cfmx (cf6 & above) it's utf-8, that should be common knowledge. for cf5 &
                                          older versions it's supposed to be latin-1 but cf never really paid much
                                          attention to encoding in those versions.
                                          • 18. Re: Problem with UTF-8 encoding
                                            Level 7
                                            > actually it's good practice to use <cfprocessingdirective> as the BOM is
                                            > optional for utf-8.

                                            Which would only be relevant IF the file contained UTF-8 data. Like I
                                            said.

                                            You can't tell me it's "good practice" to include that tag on EVERY FILE in
                                            an application, "just in case". Because that would only lead me to ask why
                                            - if it's such good practice - it is that CF cannot work out for itself
                                            that the file has UTF-8 content(*), and why it's up to the developer to
                                            tell it. You can't have it both ways.

                                            Do YOU, Paul, put <cfprocessingdirective> at the top of ALL your files?

                                            Anyway, "just in case" scenarios should not apply to source code, should
                                            it? The developer will (well: SHOULD) know whether their templates have
                                            UTF-8 data within it.

                                            --
                                            Adam

                                            (*) Especially when the file DOES have a UTF-8 BOM.
                                            • 19. Re: Problem with UTF-8 encoding
                                              Level 7
                                              Adam Cameron wrote:
                                              > You can't tell me it's "good practice" to include that tag on EVERY FILE in
                                              > an application, "just in case". Because that would only lead me to ask why

                                              actually that's exactly what i'm telling you. unless you have 100% perfect
                                              control over all your cf pages, all the time, somebody can come along & edit them.

                                              > - if it's such good practice - it is that CF cannot work out for itself
                                              > that the file has UTF-8 content(*), and why it's up to the developer to
                                              > tell it. You can't have it both ways.

                                              once again, the BOM is optional.

                                              > Do YOU, Paul, put <cfprocessingdirective> at the top of ALL your files?

                                              for real work, pretty much so, those are my good practices. i do admit to
                                              knocking tests/demos of without it.

                                              > Anyway, "just in case" scenarios should not apply to source code, should
                                              > it? The developer will (well: SHOULD) know whether their templates have
                                              > UTF-8 data within it.

                                              see above.

                                              • 20. Re: Problem with UTF-8 encoding
                                                Level 7
                                                >actually that's exactly what i'm telling you. unless you have 100% perfect control over all your cf pages, all the time, somebody can come along & edit them.

                                                Well it's either me or someone on my team. All of which are developers,
                                                rather than gibbons, so ought to know what they're doing.

                                                We can agree to disagree, which is fine, but I think your practice is a
                                                poor one. It's far more useful to leave the simple ASCII files alone, and
                                                IFF a file has UTF-8 content in it, for whatever reason, THEN mark it
                                                accordingly. It is then a flag to anyone reviewing it that it's there,
                                                like a warning "yes, I meant it to be like this, there is a reason, HEED".


                                                >> - if it's such good practice - it is that CF cannot work out for itself
                                                >> that the file has UTF-8 content(*), and why it's up to the developer to
                                                >> tell it. You can't have it both ways.
                                                >
                                                > once again, the BOM is optional.

                                                Sure. Which imples that it's not adequate to rely on it being there. So
                                                the responsibility falls onto the application reading the file to determine
                                                whether the content is UTF-8 or not. If NOTEPAD can manage it, I puzzle as
                                                to why CF cannot, and relies on people like you to put
                                                <cfprocessingdirective> at the top of every template.

                                                --
                                                Adam
                                                • 21. Re: Problem with UTF-8 encoding
                                                  Level 7
                                                  Adam Cameron wrote:
                                                  > Well it's either me or someone on my team. All of which are developers,
                                                  > rather than gibbons, so ought to know what they're doing.

                                                  you assume too much.

                                                  > We can agree to disagree, which is fine, but I think your practice is a poor
                                                  > one. It's far more useful to leave the simple ASCII files alone, and IFF a
                                                  > file has UTF-8 content in it, for whatever reason, THEN mark it accordingly.

                                                  again, the BOM is optional.

                                                  > Sure. Which imples that it's not adequate to rely on it being there. So the
                                                  > responsibility falls onto the application reading the file to determine
                                                  > whether the content is UTF-8 or not. If NOTEPAD can manage it, I puzzle as
                                                  > to why CF cannot, and relies on people like you to put
                                                  > <cfprocessingdirective> at the top of every template.

                                                  only if the BOM isn't there--once again it's optional.

                                                  • 22. Re: Problem with UTF-8 encoding
                                                    Level 7
                                                    > again, the BOM is optional.

                                                    I think we could be talking @ cross-purposes. Either that or one or both
                                                    of us is being dense.

                                                    If I create a NEW text file in notepad.exe, it defaults to ANSI. If I then
                                                    insert into that file UTF-8 content, notepad.exe NOTICES this, and when I
                                                    go to save it as ANSI (no BOM), says "well... you better not... you'll
                                                    mangle your data". So notepad.exe can tell when a file hass UTF-8 content
                                                    WITHOUT the BOM being there. As it should. Like you said.

                                                    As you say, the BOM is entirely optional. So an application needs to use
                                                    *some other mechanism* to detect if it should be parsing as plain old ASCII
                                                    text, or whether it needs to treat it as UTF-8.

                                                    If notepad.exe can do this without a special <cfprocessingdirective-like>
                                                    tag, or a BOM, then blimin' CF should be capable of doing it too. one
                                                    certainly should NOT have to MANUALLY advise CF - in EVERY FILE - what it
                                                    should be doing. Bloody ridiculous.

                                                    --
                                                    Adam
                                                    • 23. Re: Problem with UTF-8 encoding
                                                      Level 7
                                                      >> Well it's either me or someone on my team. All of which are developers,
                                                      >> rather than gibbons, so ought to know what they're doing.
                                                      >
                                                      > you assume too much.

                                                      I would rather pick up the problem and deal with it (by instructing the
                                                      miscreant of the ins and outs of UTF-8 and CF's incapabilities in that
                                                      regard), than have a sledge-hammer/walnut approach such as yours.

                                                      There's also the fact that in over 3000 CF templates (>10MB of raw
                                                      character data) in our (multi-lingual, I might add) software, there is not
                                                      yet one instance of there being UTF-8 data being present on a CF template.
                                                      Which kinda puts into perspective how sensible - in my mind - it is to
                                                      globally "deal with" a situation that is in fact not that common. Of
                                                      course our s/w is not statistically representative of everyone's situation,
                                                      but it's some sort of measure.

                                                      But go your hardest... I'm not trying to convince you to do anything other
                                                      than what already makes you happy. I *am* perhaps trying to offer an
                                                      alternative position to your opinion it's a "good practice", though, I
                                                      guess.

                                                      --
                                                      Adam
                                                      • 24. Problem with UTF-8 encoding
                                                        dagamache Level 1
                                                        And while you two are debating the issue I removed the Dateformat tag in the copyright clause at the bottom of the page, problem fixed. Don't ask why, I don't know but it works now, go figure, now I move to my next problem, real time video feed of a cow walking, don't ask I just do, just do :)

                                                        David Gamache
                                                        • 25. Re: Problem with UTF-8 encoding
                                                          Level 7
                                                          > And while you two are debating the issue...

                                                          Heh. Oops.

                                                          > I removed the Dataformat tag in the
                                                          > copyright clause

                                                          Can you post the relevant line of code?

                                                          I thought you said the problem was content coming from the DB?

                                                          --
                                                          Adam
                                                          • 26. Re: Problem with UTF-8 encoding
                                                            dagamache@scoringsystem
                                                            bad code
                                                            <img src="#images#button_green.jpg" alt="" border="0">   <a href=" http://www.scoringsystem.com/scoringsystem/sandbox/copyright/copyright.cfm" target="_blank">&copy;#objtranslate.translate('#SESSION.Locale#', 4, 11)# 2002 - #DateFormat(now(), "yyyy")# / #objtranslate.translate('#SESSION.Locale#', 4, 15)#<!---&copy; Copyright 2002 - now()/Terms of Service---></a>    

                                                            changed code
                                                            <img src="#images#button_green.jpg" alt="" border="0">   <a href=" http://www.scoringsystem.com/scoringsystem/sandbox/copyright/copyright.cfm" target="_blank">&copy;#objtranslate.translate('#SESSION.Locale#', 4, 11)# 2002 - 2007  / #objtranslate.translate('#SESSION.Locale#', 4, 15)#<!---&copy; Copyright 2002 - now()/Terms of Service---></a>     Text

                                                            why it did and why it now working I don't know or care, I will leave that too the experts like you.

                                                            David Gamache
                                                            • 27. Re: Problem with UTF-8 encoding
                                                              Level 7
                                                              > why it did and why it now working I don't know or care, I will leave that too
                                                              > the experts like you.

                                                              I'm not much of an expert in this case, I'm afraid (not that, apparently,
                                                              you care ;-).

                                                              How strange.

                                                              I presume you had been making other changes to this template during this
                                                              mission, not just this one? I only ask because I have known the compiled
                                                              classes to get corrupt, and "nudging" the source code has forced a
                                                              recompile, and "problem goes away", so that it seems like something weird
                                                              like dateFormat() causing a problem.

                                                              What happens if you put it back in? (if you care to try).

                                                              --
                                                              Adam
                                                              • 28. Re: Problem with UTF-8 encoding
                                                                BKBK Adobe Community Professional & MVP
                                                                Dagamache,

                                                                It appears you wanted the year to change dynamically, which is better than setting a static value. Here is a suggestion. The main difference is there are no # signs around SESSION.Locale. You could also consider using year(now()) in place of dateformat(now(), "yyyy")


                                                                <cfset trans4_11 = objtranslate.translate(SESSION.Locale, 4, 11)>
                                                                <cfset trans4_15 = objtranslate.translate(SESSION.Locale, 4, 15)>
                                                                <cfset yearEnd = DateFormat(now(), "yyyy")>

                                                                <cfoutput>
                                                                <img src="#images#button_green.jpg" alt="" border="0">   <a href=" http://www.scoringsystem.com/scoringsystem/sandbox/copyright/copyright.cfm" target="_blank">&copy;#trans4_11# 2002 - #yearEnd# / #trans4_15#</a>    
                                                                </cfoutput>