This content has been marked as final. Show 9 replies
> I have a small question in regards to XML and escaping a character. I'm
> reading an XML file and parsing it, then re-creating the XML file with added
> items. One of the strings has a "-" in it and it spits this out in the XML:
> Any ideas on how to just read it as text and get it to show as a hyphen?
I imagine it's reading it fine, you're just not telling it how to output.
An   is a non-breaking space, so that's what's either side of the
hyphen. Your hyphen is displaying as a question mark because it's
probablty not just a simple "-" (ASCII 45), it's probably some extended
character like an m-dash (although not an —) or something like that.
What happens if you tell your browser to output the text as UTF-8 instead
of (whatever it is by default).
I still get an error. It comes out a bit odd on text, on the read end of the XML; it looked like a box. It wasn't a plain hyphen, I did try to strip that out before I created the XML and it kept missing it. For the time being, I deleted it so I could test the script.
> I still get an error.
Hang on... are you getting an *error*, or are you just not seeing what you
want to see? Those are two different things.
> on the read end of the
> XML; it looked like a box.
How do you know it "looks" like a box? What are you looking at?
What happens if you browse directly to the XML file, rather than reading it
in CF and displaying it?
> It wasn't a plain hyphen, I did try to strip that
> out before I created the XML and it kept missing it.
Is there any indication in the XML file as to which character encoding
scheme it's using?
Have you tried wrapping your code with a CDATA block. For example,
<![CDATA[ problem code here ]]>
Here is the definition from Microsoft's site:
"CDATA sections provide a way to tell the parser that there is no markup in the characters contained by the CDATA section. This makes it much easier to create documents containing sections where markup characters might appear, but where no markup is intended. CDATA sections are commonly used for scripting language content and sample XML and HTML content."
Sr. Web Applications Architect,
Macromedia Certified ColdFusion MX Advanced Developer
>Hang on... are you getting an *error*, or are you just not seeing what you
>want to see? Those are two different things.
I shouldn't say error, my bad. It didn't look correct when I viewed the XML through the browser. It looked like a graphic box. I was trying to figure out why I couldn't just read the XML and write it exactly as I read it into a new XML file and have it look the same when viewing it in the browser. Does that make sense?
>Is there any indication in the XML file as to which character encoding
>scheme it's using?
The encoding was: iso-8859-1
So, if I copy and paste the picture (from the XML) to here, it comes out like this "–". On the RSS validator, it states that the characters are this: \x96 = which also comes out as a hyphen here. I'm not so sure how to catch this in the future.
CDATA is what you need to do....
> The encoding was: iso-8859-1
Are you telling the browser to expect iso-8859-1?
Yes. It seems to work OK with the RSS feed. I thought it wasn't validated. But, it does seem to go through as a hyphen in a Feed Reader.