    "Smart Quotes" and Extensions


      I'm working on a text processing extension and want to implement a "Convert special characters to safe HTML entities" (that's just a working title!) menu option.  When I copy and paste some blocks of text from Word into the editor, then right-click my selection and choose my extension to process that text, the text that gets sent via the xml packet to my handler converts (in this case) Word's smart quotes (&#8220 and &#8221) to question marks.  This is before my handler processes anything.  I'm just using cfdump to look at the data.  Am I missing some intermediary processing that needs to be done somehow?  I can't figure out where it would be done since I don't have any control over the selected editor content until my handler is fired.





        Re: "Smart Quotes" and Extensions
          JR "Bob" Dobbs

          What is the character encoding of the page and XML package?   You might try explicitly setting the encoding to utf-8.

          Re: "Smart Quotes" and Extensions
            SidianConsulting

            The encoding is utf-8 (which is the default in CF 9 (and maybe CF 8 too), but I explicitly set it anyway, and still no luck.  It seems like Word Smart Quotes are actually seen by ColdFusion as 3 different characters.  I don't know that Smart Quotes are actually representable as specific entities, though.  When I copy a closing smart quote out of Word into a cfm file and so something like <cset q = "{smart quote here}">, and then loop over that variable one character at a time and output the ascii values, I get 3 values: 226, 8364, 65533.


            Here's a test that contains Smart Quotes that I just copied from MS Word into this editor, it will be interesting to see how they are represented after I post this message:





            Re: "Smart Quotes" and Extensions
              SidianConsulting

              The code generated to display those Smart Quotes in my response is:



              I am not sure how to go about processing those in my pages, though.  If I could figure that out
              maybe I could figure out what is going on when those characters get passed through to my extension handler.