Adventures in ePub conversion

The ePub version of Leslie Kurke’s The Traffic in Praise (ISBN 9781939926012) has now been made available for sale on Lulu.com and submitted to a process of validation for sales through other channels, such as Barnes and Noble and Apple iBookstore (it may take several weeks before the item is approved and available on those sites).

The process of creating the ePub turned out to be more frustrating than expected. The scholarly series that were in partnership with the California Digital Library and UC Press were able to pay a fee to have their book converted to ePub format once the print edition had been completed. Lulu.com is now the publishing services partner instead of UC Press, and I had just assumed the same situation would apply. Lulu.com, however, offers conversion for free, but only from Word files, and if you have been working with InDesign, as I was, you are on your own. (There were of course Word files at an earlier stage, but too much had been edited and formatted since that time for them to be of any use now.)

Initially, then, I read some short explanations of creating an ePub from InDesign (the command to Export book to ePub is right next to the command to Export book to PDF), and I went ahead and generated the file, which required only minor tinkering after generation. Or so I thought.

(1) The special ePub table of contents file (toc.ncx) had to be revised so that chapter names would be present rather than the default filenames of all the separate xhtml documents that make up the parts of the book in this format.

(2) Page numbers, blank pages,  manual page breaks, and any manually added running heads (added by override, as was necessary in the front matter file and in the index file) had to be edited out.

(3) For proper display of the front matter elements on separate pages, it was necessary to add a new paragraph style used only at the break points and then use the settings in InDesign to split the ePub files at that paragraph style.

After a bit of learning and experimentation, I had a version that looked very good in Adobe Digital Editions and also pretty good in iBooks for Mac OS X, and a helpful colleague verified it on an iOS device. The Adobe reader had the best presentation, in that it respected the links between footnote numbers and the footnotes (which appear at the end of each chapter in this format: the only choice other than all at the end of the book), so one could automatically go back and forth between text and footnote. In iBooks the links did not function.

The next step, however, was to get this format for sale on Lulu.com and on other eBook marketers, and to do that it had to pass validation when uploaded to Lulu. Having a little experience with validators for javascript, html, xml, xslt, and fonts, I am aware that some validators are better than others in giving actionable feedback as to what is invalid. In certain circumstances, a validator may give a very opaque report, and some errors will not be real at all, but only reported because something went wrong earlier in the process of checking the file. Lulu’s validator immediately returned an upsetting number of errors that would block distribution to Barnes and Noble and IBookstore and even simply uploading and selling at Lulu itself. This list included illegal image files, namespace errors, file permission errors, and files missing from the manifest. (See the end of this post for details.) The file that I thought was all ready was not going public yet.

Unfortunately, Lulu’s help documents are not helpful about these errors, and the frustrated users who ask questions at their users’ forum are often not getting much help, except for people trying to sell services. I learned to use epubcheck 3.01 from the command line in Terminal, and I could make sufficient revisions to pass epubcheck, but still fail validation at Lulu (which in fact uses epubcheck and then makes some additional tests specific to Nook and iBooks). I also compared the files and declarations in a couple of free ePub books to see how they differed from what I had generated from InDesign. Since the semester was beginning and I had enough other things to keep me busy, I was sorely tempted to try to hire a professional to solve the problem.

Eventually I found discussions on Adobe help forums and elsewhere that solved the problem. The main problem is the font embedding setting in InDesign. In the PDF that generates the POD hard copies, one must embed the fonts, and in the PDF the fonts are embedded as partial sets and are encrypted to protect the font-maker’s property rights. Our book was set in Minion Pro from Adobe, with the metrical breve symbol (used twice) supplied from KadmosU font, since Minion Pro doesn’t contain this symbol. When the embedding setting is used, InDesign adds a file encryption.xml within META-INF (which in other ePub files contains only one file, container.xml) and also encrypts the fonts. encryption.xml is not in the manifest, and although epubcheck 3.01 doesn’t mind its presence, the additional validators do, and the encrypted files are rejected as well. There is an excellent short video tutorial on this issue from Lynda.com at http://www.youtube.com/watch?v=_bWXfFsdSYw .

So the solution turned out to be as follows:

(1) I did not embed fonts when exporting to ePub from InDesign.

(2) I edited the .epub archive (which is actually a .zip with a different suffix) with BBEdit and Oxygen XML Editor. The latter is great because one can change file names and even add files to the archive. [Side note: thus you don’t have to unzip and zip the archive, which is a nuisance in OS X. If you change the suffix to .zip, double-clicking will not unzip it. You have to use the command line in Terminal. But then you also have to worry about compressing into a clean archive without the hidden files that OS X puts in zip archives created directly in the Finder.]

(3) InDesign leaves the title and several other important metadata elements blank in the content.opf file, so you need to supply the text for these elements (title, creator, description, publisher, date, rights, and identifier seem to be the essential ones for validation).

(4) InDesign renamed one of the two images, giving it the suffix .jpeg (both had been .jpg). I’m not sure whether this was necessary, but in Oxygen I changed the suffix back to .jpg and changed the corresponding references to this file in the manifest is content.opf and in the xhtml file for the title page where the href for this logo image was used.

(5) In toc.ncx I replaced the navMap element with a version that had the chapter titles as I wanted them to appear in the ePub TOC.

(6) I created a folder “fonts” in the archive (with Oxygen) and added to that folder the .ttf file for the regular version of KadmosU, which is embeddable by license with no requirement for encryption. I then added a declaration of this font file to the manifest in content.opf. To make sure all eBook readers pay attention to this font, it is also necessary to add a declaration of this font in a font-face entry in the css file (this is explained with an example of the syntax in the video tutorial referenced above).

(7) If I had been starting from scratch, I would have replaced Minion Pro in the InDesign styles with something else (like Gentium), and added Gentium in the same way. But since I had spent excessive time on this already, I simply left the styles as they were. The eBook readers will use a default serif font. One cannot manually add an unencrypted Minion Pro font file since this is a font with license restrictions. Adobe was thus legally correct to insist on encrypting it, but InDesign is very unhelpful in encrypting fonts that don’t need to be encrypted and in giving no warning that encrypted fonts will actually be a problem EXCEPT in Adobe Digital Editions (or at least will offer validation problems, even if the readers would not be bothered).

Working this closely with the ePub files revealed a few other things I will have to watch for in future projects using InDesign. When Greek characters, or even roman characters with diacritics in French or German or transliterated Greek, appear in the InDesign files, some had the xml:lang Arabic applied to them, but not all Greek was so tagged. I’ve checked the Word files from which the text was “placed” in InDesign, and these did not have a language designation, so this indiscriminate sporadic addition of the incorrect Arabic tag is InDesign’s doing. Why a programmer would create a routine that sees a character from Unicode’s Latin Extended or Unicode’ Greek blocks and then applies the tag Arabic to it is quite a puzzler.

In any case, the .epub file finally passed epubcheck and also uploaded successfully to Lulu.com. Now we wait to see whether after a few weeks it is accepted for sale by the various vendors.

Donald Mastronarde

LIST OF ISSUES from the Lulu.com validator (headed by warning “The following issues were found in your EPUB, which can affect its eligibility for certain channels:”), with my comments added in italics

Contains unmanifested files. (the iBookstore and Barnes & Noble) [This meant the file encryption.xml that InDesign had created when embedding fonts.]

Contains XML namespace errors.  (the iBookstore and Barnes & Noble) [Apparently a red herring.]

Contains file permissions errors.  (the iBookstore and Barnes & Noble) [Probably related to the encrypted font files, although the unix command ls -l didn’t seem to me to show any suspicious permissions.]

Contains invalid images.  (the iBookstore and Barnes & Noble) [Again, a red herring, I think, unless it was the file that InDesign renamed with .jpeg in place of .jpg.]

The table of contains contains one or more links without text.  (the iBookstore and Barnes & Noble) [I don’t think this was true either.]

Contains invalid guide section XML.  (Barnes & Noble) [The guide element of content.opf is optional, and InDesign did not create one, so this error was mysterious. I ended up adding a guide element, in case Barnes & Noble insists on having one, but I don’t think this was necessary, since I found a document explaining B&N formatting suggestions and it mentioned the guide element as optional too.]

Contains invalid creator XML. (Barnes & Noble) [InDesign output the creator element with no text; the author’s name had to added manually.]

Contains invalid publication date XML. (Barnes & Noble) [Another red herring, as far as I can tell, since there was a valid date in the file created by InDesign.]

Contains invalid publisher XML. (Barnes & Noble) [InDesign output the publisher element with no text; the pubisher’s name had to added manually.]

There was at one stage also a warning about an empty title element; again this was what InDesign produced, and the title in content.opf had to be added manually.