5 Replies Latest reply on Mar 5, 2014 6:35 AM by doncx

    Encoding of UTF8 Characters in URL Strings

    doncx

      I need to link to a URL that has a right single quotation mark in it (u+2019).

       

      This is it: "/News/Case-Studies/UNICEF-Headquarters’-Redesigned-Lobby-Space"

       

      When I paste the link in to a browser, it works.

       

      When I create a link with this URL as the href and click on it, it works.

       

      When I put it in a <CFLocation> tag, it does not work.  I need to make this work.

       

      When I take the URL from a browser address bar and paste it into a text editor, it converts the right single quote to the percent-encoded string "%E2%80%99".  I have not been able to recreate this encoding. URLEncodedFormat() yields a completely different string.

       

      When I parse the url and note the ASCii value of each character, I get three characters for the right single quote:  226 8634 8482.

       

      I came up with an encoding that worked, but resulted in special characters in the address bar of the browser (don't remember what the technique was at this point).

       

      What encoding can I perfrom at the CF server to duplicate the proper UTF8 encoded string of %E2%80%99?

       

      Any help would be appreciated.

        • 1. Re: Encoding of UTF8 Characters in URL Strings
          BKBK Adobe Community Professional & MVP

          If you convert U-2019 to base 10, you will get 8217. The character you want is then, in ColdFusion terms, chr(8217).

           

          The coding goes like this:

           

          <cfset base10Representation = inputBaseN(2019,16)>

          <cfset rightSingleQuotationMark = chr(base10Representation)>

          <cfset str="/News/Case-Studies/UNICEF-Headquarters" & rightSingleQuotationMark & "-Redesigned-Lobby-Space">

           

          Alternatively, you could do everything in one go, like this

           

          <cfset str2="/News/Case-Studies/UNICEF-Headquarters#chr(inputBaseN(2019,16))#-Redesigned-Lobby-S pace">

          • 2. Re: Encoding of UTF8 Characters in URL Strings
            BKBK Adobe Community Professional & MVP

            Woe! What you have found is probably a bug. I could reproduce it as follows:

             

            <cfset base10Representation = inputBaseN(2019,16)>

            <cfset rightSingleQuotationMark = chr(base10Representation)>

            <cfset str="http://127.0.0.1:8500/News/Case-Studies/UNICEF-Headquarters" & rightSingleQuotationMark & "-Redesigned-Lobby-Space">

            <cflocation  url="#str#">

             

            It replaces the quotation mark by a space, redirecting instead to

             

            http://127.0.0.1:8500/News/Case-Studies/UNICEF-Headquarters%20-Redesigned-Lobby-Space

             

            This is obviously wrong. You could use Javascript to create a workaround for cflocation, as follows:

             

            <script type="text/javascript">

              <cfoutput>window.location.replace("#str#")</cfoutput>

            </script>

            • 3. Re: Encoding of UTF8 Characters in URL Strings
              doncx Level 1

              I appreciate your replies, but I'm still mystified.

               

              My user has pasted in a URL with a unicode single right quotation mark. I need to encode it as %E2%80%99.

               

              When I parse the single right quotation mark, I get three ascii bytes; chr(226) & chr(8364) & chr(8482)

               

              The first byte properly tells me that a three-byte unicode character is beginning (asc 226, or hex e2).

               

              I would think the next two bytes would be an asc 128 (hex 80) and an asc 153 (hex 99), which I could then easily convert to %E2%80%99, the proper sequence for a right single quotation mark..

               

              But the next two bytes report as different and significantly higher asc values (8364 and 8482).

               

              I don't see how I can possibly convert this properly.

               

              If I simply put a chr(8217) where the unicode character was, it works, but how on earth do I arrive at that?

               

              How do I get from chr(226) & chr(8364) & chr(8482) to chr(8217)?

               

              Still looking for ideas.  Thanks.

              • 4. Re: Encoding of UTF8 Characters in URL Strings
                BKBK Adobe Community Professional & MVP

                I don't think you should proceed with chr(226) & chr(8364) & chr(8482). I do believe things were already messed up by the time you got there.

                 

                Those 3 characters stand for ’. You got that representation because you used the wrong encoding to display the single-right-quotation-mark, to start with. When you parse the single-right-quotation-mark, using the proper encoding, for example, UTF-8, you should get just one ASCII byte, namely, chr(8217).

                • 5. Re: Encoding of UTF8 Characters in URL Strings
                  doncx Level 1

                  Ok, that pointed me in the right direction.

                   

                  The string, of course, is stored in a database table.  When I get it directly from there instead of reading it off the URL, the chr(8217) is properly represented.

                   

                  Conversion is then simple.

                   

                  Thanks for discussing it with me.