6 Replies Latest reply on May 27, 2010 11:57 PM by hfday

    editing XML in ePub files (also posted in ID forum)

    myDwayneSmith Level 1

      Hi all


      I posted this in the InDesign forum a few days ago but got no response, so I'm trying here. My apologies for the double-up.


      I'm about to start preparing ePUB files (from InDesign CS4) for a particular distributor whose instructions state:

      "Text encoding should all be in UTF-8 (this can be checked in the OPF, the first line should be “<?xml version="1.0" encoding="UTF-8" ?>” ".


      When I check my initial test files using PDFXML Inspector (with pretty printing turned off — thanks Gabriel Powell) the first line states only “<?xml version="1.0"?>”.


      I ran a test by simply adding the "encoding="UTF-8""  bit to that line and everything seems to work fine in Calibre and Digital Editions (as it did with the unedited version — there seems to be no difference).


      BUT, is this the correct way to do things or am I inadvertently setting myself up for disaster?

      Does InDesign automatically export with text encoded as UTF-8 — or some other format?

      Are there other settings in InDesign which alter the way text is encoded and the XML is generated?


      AND — my other question...

      This distributor is also asking for additional metadata entries.

      Can I integrate them anywhere within the metadata or are there particular ordering protocols that need to be adhered to?


      As you can see, I'm an XML ignoranti and any assistance will be most gratefully received.



        • 1. Re: editing XML in ePub files (also posted in ID forum)
          myDwayneSmith Level 1

          OK — update to my question for anyone else who might be brave enough to tackle the ePub format from InDesign....


          Including embeddable fonts when exporting to Digital Editions seems to cause all sorts of issues if you need to further edit the ePub content files (which you probably WILL need to do because ePubs from InDesign CS4 seem not to comply with IDPF standards).


          Exporting a 'clean' ePub (no embedded fonts, style names only, etc) — seems to fix the problem I had with text encoding — all xhtml files now state:

          <?xml version="1.0" encoding="utf-8"?>
          <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
          <html xmlns="http://www.w3.org/1999/xhtml">


          So, now all the weird stuff I was seeing in Calibre has resolved itself.

          • 2. Re: editing XML in ePub files (also posted in ID forum)

            Hope with your new expertise you might be able to help me   I'm trying to edit the xhtml files that are output from InDesign CS4 as an epub.  I've extracted them as archive files, used DreamWeaver to edit the xhtml and then re-archived them (I use WinRAR) and re-named the archive with the .epub suffix.  The eReader can't open this file, since it says it's corrupt.  I don't see how else to edit the files - any clues what I'm doing wrong please?



            • 3. Re: editing XML in ePub files (also posted in ID forum)
              Jim Lester Level 4

              I'm not sure about how to do this in WinRAR, but the ePub Format is  rather explicit on one thing in the packaging the mimetype file must be  the first file, and it must be uncompressed.  From the command line it  can be done like so:

              zip -v0X target mimetype  
              zip -vr target * -x target.zip mimetype  
              mv target.zip target.epub


              (copied from  http://www.openebook.org/forums/viewtopic.php?t=85  )

              • 4. Re: editing XML in ePub files (also posted in ID forum)
                hfday Level 1

                Hi - thanks for that - but I have now managed to edit the files (just changed a header statement) and get them back into the zipped format, but the book looks complete rubbish!  I'm experimenting with a trial copy of InDesign CS5, seeing how it handles the headers to allow new pages for each chapter, and then reproducing that (simple) code.  But it seems to impact everything else too - no page breaks still, and huge fonts that don't resize.  I guess I'm going to have to break out my life savings and upgrade to CS5.



                • 5. Re: editing XML in ePub files (also posted in ID forum)
                  myDwayneSmith Level 1

                  Hi Frances,


                  Please be aware that I have not got any ePubs to the retail stage as yet, I was working flatout on a bunch of ePubs for a few months when the project got put on the backburner — so my experience is limited.


                  I'm not sure what your process is, but InDesign CS4 does an OK job of creating ePubs — just so long as you set the files up correctly (eg. the easiest way to get page breaks between chapters is to save each chapter as a separate InDesign file and then combine them in a book. ALSO, everything should be styled using paragraph, character and oject styles)

                  Unless CS5 has SIGNIFICANT upgraded support for ePub, you might be wasting your money.


                  Check out the excellent tutorials by Gabriel Powell

                  He gives the most outstanding introduction to ePUB from InDesign I've yet seen.









                  A couple of things I found I needed to do to get ePubs looking right in anything other than ADE:

                  • I got a better result if I wrote my own CSS — InDesign does not do a good job of this at all, AFAIC. Export style names only and use these as the basis for your CSS.

                  • Indesign fails to list fonts in the manifest, so if you want to specify your own fonts you need to add them manually and reference them in the manifest. Don't bother trying to embed fonts at export — it causes too many problems, it seems.

                  • The only thing I need to do in the xhtml files is fix a few non-alpha characters sometimes (like bullets, apostrophes, quotes, etc) I've no idea why these sometimes fail and sometimes don't.


                  I use both Calibre and ADE to view ePubs — because they will each highlight different issues.
                  I built my CSS and a few other bits and pieces in Text Edit — this makes it easy to re-use things like the CSS for multiple publications.
                  I use PDFXML Inspector to do the editing — this allows siginificant editing capability without the need to unpack the ePub.
                  And I use Terminal to repackage ePubs (ie. after adding a fonts folder, or whatever) — using the command lines in Gabriel Powell's tutorial.
                  ie. I've spent nothing on additional software — it's all either free or already on the Mac.

                  It's a very steep, frustrating learning curve — LOTS of trial and LOTS of error. But it's also fun.
                  Good luck

                  • 6. Re: editing XML in ePub files (also posted in ID forum)
                    hfday Level 1

                    Wow! Many thanks for all that excellent advice.  I have indeed seen the excellent video. I've used Calibre but not ADE, so I'll try that. You obviously have a lot more knowledge of css than I do - I am also on a very steep learning curve here.


                    I have, however, already done what you (and the video) suggest in using styles for everything and assembling a book from separate chapters (that's slowed down my work-flow very effectively!)  Everything works well except for the new page for a chapter in CS4, despite using the TOC idea.  In CS5 it works fine and I get a new page for each chapter.  I have compared the output files and see that CS5 also creates about ten files per chapter, whereas CS4 creates just a single file per chapter.  CS5 also includes a fonts list.  Although I don't think the changes for CS5 are truly significant, it may be that the reduced amount of work I need to do for each book could make it worthwhile upgrading.  (I have to convert nearly 50 books in the next couple of months).


                    I will, however, try out some more of your suggestions before committing myself and I'm really grateful for your help