Further thoughts on ePub for scholarly publishing

This is a follow-up to the post Adventures in ePub Conversion.

After the completion of the print-quality PDF of Edward Courtney’s A Commentary on the Satires of Juvenal and publication by print on demand, the conversion to ePub within InDesign was relatively straightforward but not hassle-free. Here is a record of the steps and the issues encountered.

1. The 21 InDesign files that make up the book were copied to a new folder, a new InDesign book was created in that folder, and the files were added to the book.

2. With the All Documents setting in InDesign Find and Replace, Minion Pro font was replaced everywhere with Times New Roman. Since InDesign does not allow one to search for a font family without also specifying a font style, this takes three separate actions to replace regular, italic, and bold styles. [A methodologically superior and more efficient method would be available if the paragraph styles (which originated in a collection of styles provided by a supportive editor at a press) had been better organized and hierarchically defined. With the experience of preparing two books, it will now be possible to devote some time to revising the style definitions to make future projects easier and consistent, and a global change of font much easier.]

3. The change of font caused many of the documents to have overset text errors. Each document had to be reviewed and a page added where necessary to accommodate the overset text.

3. In the document for the front matter, the Table of Contents was deleted, additional paragraphs were added (to title page, copyright page, etc.) to improve the spacing that would appear in the ePub display, the LCCN was deleted and the ISBN revised to that assigned to the ePub format. To break the front matter into sections, a special paragraph style was applied at the proper locations.

4. The print index had two columns in three portions of the text. For the ePub these had to be converted back to one column in each portion (the Split 2 setting being changed to None). This produced overset text errors, many additional pages had to be added. The special format for splitting the document was added twice to separate the three indexes in the ePub.

5. Fortunately, the index is almost entirely made up of references by poem number and line number, and these never had to be edited for the print version or the ePub. There were, however, about 20 page references in the index. These had been altered to reflect the new pagination in the print version, and a list had been compiled of all such changes made in the index and in earlier parts of the book, where almost 80 other cross-references were changed. The page references now had to be altered back to the old pagination, which had been incorporated within the text. (This was far more efficient in this case than incorporating the new page numbers (555 of them) into the text as well.)

6. The 80 cross-references in the body of the book were changed. This took very little time because there was a compiled list to check and searching for “p. ” and “pp. ” was also an efficient means of checking.

7. In the front matter and index document some override (manual) headers had to be removed (the ePub generation in InDesign automatically removes the headers and page numbers on the Master pages). (This also had to be done in part of the document containing Satires 13 and 14. The commentary on each satire was a separate ID file, but an unexplained and unresolved pagination bug in the file for Satire 14 forced me to add that part of the commentary to the file for Satire 13.) The three maps had to have their titles moved from the header into the text itself.

8. With the All Documents setting, searches were performed for non-breaking hyphens and for manual line and page breaks, to determine whether any would cause problems in the display of the ePub.

9. After the above changes, it was necessary to repeat the font replacement searches, since some new paragraphs again contained Minion Pro in their style definition.

After the above steps, an ePub was generated without embedding fonts (even with Minion Pro absent from the book, In Design encrypted other fonts). Inspection of the result required further adjustments.

1. The three maps had been in graphics frames on pages of their own. These graphics needed to be changed to inline graphics, and the largest map needed to be resized.

2. There are half a dozen characters in the book that are not present in standard Times New Roman or many other fonts (e.g., y with breve), so KadmosU had to be added to the ePub. Oxygen XML Editor was used to add the Fonts folder and add the regular and italic styles.

3. BBEdit was used to edit content.opf, where much of the metadata is missing (needed to add author, title, description, license, etc.). The manifest entries for the font files needed to be created. The file toc.ncx was edited so that the correct chapter or section titles would be displayed in place of the file names. The css file had to have the font face declarations added for the KadmosU regular and italic.

4. The first part of the Introduction contained a transcription of an inscription that used underdots to indicate uncertain letters. In the ePub display some of the combined characters worked and some did not, with no rhyme or reason (one E with a dot below worked, but another did not). The easiest course was to change these few lines to a graphic.

5. It later emerged that a form of CUI in the note on Satire 7.210-12 in which both the U and the I were supposed to have a breve (U+016c and U+012c) would not display until the  two vowels in the XML file were replaced with the decimal entities ŬĬ

6. At one place there is a precomposed upsilon with breve and acute accent, supplied from KadmosU font. This shows up in Adobe Digital Editions, but not in Apple iBooks (which uses a rather ugly Greek font in any case, ignoring the embedded font).

7. At some places, it was necessary to edit the XML files to paste in a paragraph containing only a non-breaking space, to create proper spacing between headings and the text above or below.

During this work, the command-line epubcheck utility was used at intervals, and there was never an error. In addition, at the last stage, a penultimate and a final version of the ePub file was uploaded to Lulu and both uploads were without error in the checks that Lulu’s system performs.

Is it worthwhile to create an ePub of scholarly books?

The format is definitely a significant step backwards in sophistication and capabilities in comparison with PDF or even old-fashioned “desktop publishing” from MS Word files. That is the cost of accommodating the mass market and supporting the notion of reading books on tiny screens of devices employing a relatively crippled OS.

The problems with combining characters such as underdots may mean that the format is useless for papyrological texts and for anything involving sophisticated representation of linguistic features (such as having macron or breve with other diacritics).

The problem of pagination and indexing is not trivial. Should ePub readers simply expect not to have an index with useful page references, since they can probably find some things by the search mechanism? The problem was more complicated in the case of the Kurke and Courtney books because they were repaginated reprints of previous editions. If a book is being produced from scratch, the situation will be a little different. After producing a PDF for print on demand, should one always incorporate the print page numbers into the text for the ePub version? Is there an easy and efficient way to produce an index with automated links?

For some of these questions, we await evidence of whether anyone will actually buy these books in this format. Clearly, if the sales are insignificant, it is foolish to devote the effort to creating the ePub version. Having learned a lot from the problems encountered the first time around, I was able to complete the processing described above in less than 8 hours. A few more hours were spent by an editorial assistant looking through the penultimate version to check for problems.

Edward Courtney’s Commentary on Juvenal now available

A Commentary on the Satires of Juvenal by Edward Courtney,  Gildersleeve Professor of Classics Emeritus at the University of Virginia, is now available at the CCS sales site in both print-on-demand and ePub formats. The open-access page-view is now also available on the eScholarship site.

This is a reprint of the 1980 edition, with corrections and minor additions provided by the author. Courtney’s study of the Satires of Juvenal is the only full-scale commentary on the corpus since the nineteenth century and retains its value for students and scholars a generation after its first appearance.

After an embargo period of two years, the full PDF will become accessible for free download at the open-access site.

Adventures in ePub conversion

The ePub version of Leslie Kurke’s The Traffic in Praise (ISBN 9781939926012) has now been made available for sale on Lulu.com and submitted to a process of validation for sales through other channels, such as Barnes and Noble and Apple iBookstore (it may take several weeks before the item is approved and available on those sites).

The process of creating the ePub turned out to be more frustrating than expected. The scholarly series that were in partnership with the California Digital Library and UC Press were able to pay a fee to have their book converted to ePub format once the print edition had been completed. Lulu.com is now the publishing services partner instead of UC Press, and I had just assumed the same situation would apply. Lulu.com, however, offers conversion for free, but only from Word files, and if you have been working with InDesign, as I was, you are on your own. (There were of course Word files at an earlier stage, but too much had been edited and formatted since that time for them to be of any use now.)

Initially, then, I read some short explanations of creating an ePub from InDesign (the command to Export book to ePub is right next to the command to Export book to PDF), and I went ahead and generated the file, which required only minor tinkering after generation. Or so I thought.

(1) The special ePub table of contents file (toc.ncx) had to be revised so that chapter names would be present rather than the default filenames of all the separate xhtml documents that make up the parts of the book in this format.

(2) Page numbers, blank pages,  manual page breaks, and any manually added running heads (added by override, as was necessary in the front matter file and in the index file) had to be edited out.

(3) For proper display of the front matter elements on separate pages, it was necessary to add a new paragraph style used only at the break points and then use the settings in InDesign to split the ePub files at that paragraph style.

After a bit of learning and experimentation, I had a version that looked very good in Adobe Digital Editions and also pretty good in iBooks for Mac OS X, and a helpful colleague verified it on an iOS device. The Adobe reader had the best presentation, in that it respected the links between footnote numbers and the footnotes (which appear at the end of each chapter in this format: the only choice other than all at the end of the book), so one could automatically go back and forth between text and footnote. In iBooks the links did not function.

The next step, however, was to get this format for sale on Lulu.com and on other eBook marketers, and to do that it had to pass validation when uploaded to Lulu. Having a little experience with validators for javascript, html, xml, xslt, and fonts, I am aware that some validators are better than others in giving actionable feedback as to what is invalid. In certain circumstances, a validator may give a very opaque report, and some errors will not be real at all, but only reported because something went wrong earlier in the process of checking the file. Lulu’s validator immediately returned an upsetting number of errors that would block distribution to Barnes and Noble and IBookstore and even simply uploading and selling at Lulu itself. This list included illegal image files, namespace errors, file permission errors, and files missing from the manifest. (See the end of this post for details.) The file that I thought was all ready was not going public yet.

Unfortunately, Lulu’s help documents are not helpful about these errors, and the frustrated users who ask questions at their users’ forum are often not getting much help, except for people trying to sell services. I learned to use epubcheck 3.01 from the command line in Terminal, and I could make sufficient revisions to pass epubcheck, but still fail validation at Lulu (which in fact uses epubcheck and then makes some additional tests specific to Nook and iBooks). I also compared the files and declarations in a couple of free ePub books to see how they differed from what I had generated from InDesign. Since the semester was beginning and I had enough other things to keep me busy, I was sorely tempted to try to hire a professional to solve the problem.

Eventually I found discussions on Adobe help forums and elsewhere that solved the problem. The main problem is the font embedding setting in InDesign. In the PDF that generates the POD hard copies, one must embed the fonts, and in the PDF the fonts are embedded as partial sets and are encrypted to protect the font-maker’s property rights. Our book was set in Minion Pro from Adobe, with the metrical breve symbol (used twice) supplied from KadmosU font, since Minion Pro doesn’t contain this symbol. When the embedding setting is used, InDesign adds a file encryption.xml within META-INF (which in other ePub files contains only one file, container.xml) and also encrypts the fonts. encryption.xml is not in the manifest, and although epubcheck 3.01 doesn’t mind its presence, the additional validators do, and the encrypted files are rejected as well. There is an excellent short video tutorial on this issue from Lynda.com at http://www.youtube.com/watch?v=_bWXfFsdSYw .

So the solution turned out to be as follows:

(1) I did not embed fonts when exporting to ePub from InDesign.

(2) I edited the .epub archive (which is actually a .zip with a different suffix) with BBEdit and Oxygen XML Editor. The latter is great because one can change file names and even add files to the archive. [Side note: thus you don’t have to unzip and zip the archive, which is a nuisance in OS X. If you change the suffix to .zip, double-clicking will not unzip it. You have to use the command line in Terminal. But then you also have to worry about compressing into a clean archive without the hidden files that OS X puts in zip archives created directly in the Finder.]

(3) InDesign leaves the title and several other important metadata elements blank in the content.opf file, so you need to supply the text for these elements (title, creator, description, publisher, date, rights, and identifier seem to be the essential ones for validation).

(4) InDesign renamed one of the two images, giving it the suffix .jpeg (both had been .jpg). I’m not sure whether this was necessary, but in Oxygen I changed the suffix back to .jpg and changed the corresponding references to this file in the manifest is content.opf and in the xhtml file for the title page where the href for this logo image was used.

(5) In toc.ncx I replaced the navMap element with a version that had the chapter titles as I wanted them to appear in the ePub TOC.

(6) I created a folder “fonts” in the archive (with Oxygen) and added to that folder the .ttf file for the regular version of KadmosU, which is embeddable by license with no requirement for encryption. I then added a declaration of this font file to the manifest in content.opf. To make sure all eBook readers pay attention to this font, it is also necessary to add a declaration of this font in a font-face entry in the css file (this is explained with an example of the syntax in the video tutorial referenced above).

(7) If I had been starting from scratch, I would have replaced Minion Pro in the InDesign styles with something else (like Gentium), and added Gentium in the same way. But since I had spent excessive time on this already, I simply left the styles as they were. The eBook readers will use a default serif font. One cannot manually add an unencrypted Minion Pro font file since this is a font with license restrictions. Adobe was thus legally correct to insist on encrypting it, but InDesign is very unhelpful in encrypting fonts that don’t need to be encrypted and in giving no warning that encrypted fonts will actually be a problem EXCEPT in Adobe Digital Editions (or at least will offer validation problems, even if the readers would not be bothered).

Working this closely with the ePub files revealed a few other things I will have to watch for in future projects using InDesign. When Greek characters, or even roman characters with diacritics in French or German or transliterated Greek, appear in the InDesign files, some had the xml:lang Arabic applied to them, but not all Greek was so tagged. I’ve checked the Word files from which the text was “placed” in InDesign, and these did not have a language designation, so this indiscriminate sporadic addition of the incorrect Arabic tag is InDesign’s doing. Why a programmer would create a routine that sees a character from Unicode’s Latin Extended or Unicode’ Greek blocks and then applies the tag Arabic to it is quite a puzzler.

In any case, the .epub file finally passed epubcheck and also uploaded successfully to Lulu.com. Now we wait to see whether after a few weeks it is accepted for sale by the various vendors.

Donald Mastronarde

LIST OF ISSUES from the Lulu.com validator (headed by warning “The following issues were found in your EPUB, which can affect its eligibility for certain channels:”), with my comments added in italics

Contains unmanifested files. (the iBookstore and Barnes & Noble) [This meant the file encryption.xml that InDesign had created when embedding fonts.]

Contains XML namespace errors.  (the iBookstore and Barnes & Noble) [Apparently a red herring.]

Contains file permissions errors.  (the iBookstore and Barnes & Noble) [Probably related to the encrypted font files, although the unix command ls -l didn’t seem to me to show any suspicious permissions.]

Contains invalid images.  (the iBookstore and Barnes & Noble) [Again, a red herring, I think, unless it was the file that InDesign renamed with .jpeg in place of .jpg.]

The table of contains contains one or more links without text.  (the iBookstore and Barnes & Noble) [I don’t think this was true either.]

Contains invalid guide section XML.  (Barnes & Noble) [The guide element of content.opf is optional, and InDesign did not create one, so this error was mysterious. I ended up adding a guide element, in case Barnes & Noble insists on having one, but I don’t think this was necessary, since I found a document explaining B&N formatting suggestions and it mentioned the guide element as optional too.]

Contains invalid creator XML. (Barnes & Noble) [InDesign output the creator element with no text; the author’s name had to added manually.]

Contains invalid publication date XML. (Barnes & Noble) [Another red herring, as far as I can tell, since there was a valid date in the file created by InDesign.]

Contains invalid publisher XML. (Barnes & Noble) [InDesign output the publisher element with no text; the pubisher’s name had to added manually.]

There was at one stage also a warning about an empty title element; again this was what InDesign produced, and the title in content.opf had to be added manually.

Leslie Kurke’s The Traffic in Praise published in digital edition

California Classical Studies has published its first volume. Leslie Kurke’s The Traffic in Praise: Pindar and the Poetics of Social Economy is a new digital edition, with minor corrections, of her 1990 monograph.

The Traffic in Praise is now available for purchase as a Print On Demand paperback at the CCS sales site. In addition, the open-access version is available at the CCS section of the eScholarship repository of the University of California. In addition to the page view mechanism there, there is no temporary embargo on downloading the full PDF for this first title in the series.

An ePub version of the book is also in preparation for release soon.

Progress report

Our first two volumes, the reprints with corrections of Leslie Kurke’s The Traffic in Praise and Ted Courtney’s A Commentary on the Satires of Juvenal, are now in first page proofs. That means that scanning and proofreading after OCR have been completed and the resulting files flowed into Adobe InDesign. Publication is tentatively set for September. In a separate forthcoming post, there will be some discussion of the pitfalls encountered and solutions used.

Progress report as of Nov. 25, 2012

Content is being added to this website; the process is not complete.

A designer has been commissioned to create a logo, which is thus not yet present here, but will be added soon.

The Submission Questionnaire will be uploaded soon. Submissions will be welcome as of Jan. 1, 2013.

A draft of the Author Agreement has been developed.

Setup will soon be in progress for the CCS site at eScholarship.org, and then for the eScholarship PLUS sales site (powered by Lulu.com).