2 Replies Latest reply: Jul 14, 2008 11:07 AM by Dave DuPlantis RSS

    Remove NULL character in XML by RegEx??

    mr. modus Community Member
      I have XML being returned that appears to have NULL characters in it. When I try to use XMLParse() I get the following error:

      An invalid XML character (Unicode: 0x0) was found in the element content of the document.

      If I save the XML to .txt file then read it again I can parse it. That's not really the way I want to do it though as it'll be slow. I'm sure this can be done through a regex. Any ideas?
        • 1. Re: Remove NULL character in XML by RegEx??
          mr. modus Community Member
          I messed around with a RegEx and came up with this:

          REReplace(thisXML,'[\x0]','','ALL')

          It seems to work but I'm no unicode or regex expert. If someone who knows their stuff with RegEx and Unicode could review my RegEx and tell me if it's truly only removing NULLs that would be great.
          • 2. Re: Remove NULL character in XML by RegEx??
            Dave DuPlantis
            Well, it's a relatively simple regex , so there isn't much to verifying it. You've got the right expression for hex code 0. I'm not sure you need the brackets at this point (indicating a character class), but it's easier to start with them so that you don't need to remember them once you find other characters to exclude.

            As near as I can tell, it should be what you want. You may end up wanting a more complicated regex if you find other invalid characters you want to remove (like byte order marks), but that could be done in a separate statement.