1 Reply Latest reply: Apr 20, 2010 10:59 PM by myDwayneSmith RSS

    ID CS4 — editing XML in ePUB file

    myDwayneSmith Community Member

      Hi all

       

      I'm about to start preparing ePUB files for a particular distributor whose instructions state:

      "Text encoding should all be in UTF-8 (this can be checked in the OPF, the first line should be “<?xml version="1.0" encoding="UTF-8" ?>” ".

       

      When I check my initial test files in PDFXML Inspector (with pretty printing turned off — thanks Gabriel Powell) the first line states only “<?xml version="1.0"?>”.

       

      I ran a test by simply adding the encoding="UTF-8"  bit to that line and everything seems to work fine in Calibre and Digital Editions (as it did with the unedited version — there seems to be no difference).

       

      BUT, is this the correct way to do things or am I inadvertently setting myself up for disaster?

      Does InDesign automatically export with text encoded as UTF-8 — or some other format?

      Are there other settings in InDesign which alter the way text is encoded and the XML is generated?

       

      AND — my other question...

      This distributor is also asking for additional metadata entries.

      Can I integrate them anywhere within the metadata or are there particular ordering protocols that need to be adhered to?

       

      As you can see, I'm an XML ignoranti and any assistance will be most gratefully received.

       

      d.

        • 1. Re: ID CS4 — editing XML in ePUB file
          myDwayneSmith Community Member

          OK — update to my question for anyone else who might be brave enough to tackle the ePub format from InDesign....

           

          Including embeddable fonts when exporting to Digital Editions seems to cause all sorts of issues if you need to further edit the ePub content files (which you probably WILL need to do because ePubs from InDesign CS4 seem not to comply with IDPF standards).

           

          Exporting a 'clean' ePub (no embedded fonts, style names only, etc) — seems to fix the problem I had with text encoding — all xhtml files now state:

          <?xml version="1.0" encoding="utf-8"?>
          <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
          <html xmlns="http://www.w3.org/1999/xhtml">

           

          So, now all the weird stuff I was seeing in Calibre has resolved itself.