Further thoughts on ePub for scholarly publishing

This is a follow-up to the post Adventures in ePub Conversion.

After the completion of the print-quality PDF of Edward Courtney’s A Commentary on the Satires of Juvenal and publication by print on demand, the conversion to ePub within InDesign was relatively straightforward but not hassle-free. Here is a record of the steps and the issues encountered.

1. The 21 InDesign files that make up the book were copied to a new folder, a new InDesign book was created in that folder, and the files were added to the book.

2. With the All Documents setting in InDesign Find and Replace, Minion Pro font was replaced everywhere with Times New Roman. Since InDesign does not allow one to search for a font family without also specifying a font style, this takes three separate actions to replace regular, italic, and bold styles. [A methodologically superior and more efficient method would be available if the paragraph styles (which originated in a collection of styles provided by a supportive editor at a press) had been better organized and hierarchically defined. With the experience of preparing two books, it will now be possible to devote some time to revising the style definitions to make future projects easier and consistent, and a global change of font much easier.]

3. The change of font caused many of the documents to have overset text errors. Each document had to be reviewed and a page added where necessary to accommodate the overset text.

3. In the document for the front matter, the Table of Contents was deleted, additional paragraphs were added (to title page, copyright page, etc.) to improve the spacing that would appear in the ePub display, the LCCN was deleted and the ISBN revised to that assigned to the ePub format. To break the front matter into sections, a special paragraph style was applied at the proper locations.

4. The print index had two columns in three portions of the text. For the ePub these had to be converted back to one column in each portion (the Split 2 setting being changed to None). This produced overset text errors, many additional pages had to be added. The special format for splitting the document was added twice to separate the three indexes in the ePub.

5. Fortunately, the index is almost entirely made up of references by poem number and line number, and these never had to be edited for the print version or the ePub. There were, however, about 20 page references in the index. These had been altered to reflect the new pagination in the print version, and a list had been compiled of all such changes made in the index and in earlier parts of the book, where almost 80 other cross-references were changed. The page references now had to be altered back to the old pagination, which had been incorporated within the text. (This was far more efficient in this case than incorporating the new page numbers (555 of them) into the text as well.)

6. The 80 cross-references in the body of the book were changed. This took very little time because there was a compiled list to check and searching for “p. ” and “pp. ” was also an efficient means of checking.

7. In the front matter and index document some override (manual) headers had to be removed (the ePub generation in InDesign automatically removes the headers and page numbers on the Master pages). (This also had to be done in part of the document containing Satires 13 and 14. The commentary on each satire was a separate ID file, but an unexplained and unresolved pagination bug in the file for Satire 14 forced me to add that part of the commentary to the file for Satire 13.) The three maps had to have their titles moved from the header into the text itself.

8. With the All Documents setting, searches were performed for non-breaking hyphens and for manual line and page breaks, to determine whether any would cause problems in the display of the ePub.

9. After the above changes, it was necessary to repeat the font replacement searches, since some new paragraphs again contained Minion Pro in their style definition.

After the above steps, an ePub was generated without embedding fonts (even with Minion Pro absent from the book, In Design encrypted other fonts). Inspection of the result required further adjustments.

1. The three maps had been in graphics frames on pages of their own. These graphics needed to be changed to inline graphics, and the largest map needed to be resized.

2. There are half a dozen characters in the book that are not present in standard Times New Roman or many other fonts (e.g., y with breve), so KadmosU had to be added to the ePub. Oxygen XML Editor was used to add the Fonts folder and add the regular and italic styles.

3. BBEdit was used to edit content.opf, where much of the metadata is missing (needed to add author, title, description, license, etc.). The manifest entries for the font files needed to be created. The file toc.ncx was edited so that the correct chapter or section titles would be displayed in place of the file names. The css file had to have the font face declarations added for the KadmosU regular and italic.

4. The first part of the Introduction contained a transcription of an inscription that used underdots to indicate uncertain letters. In the ePub display some of the combined characters worked and some did not, with no rhyme or reason (one E with a dot below worked, but another did not). The easiest course was to change these few lines to a graphic.

5. It later emerged that a form of CUI in the note on Satire 7.210-12 in which both the U and the I were supposed to have a breve (U+016c and U+012c) would not display until the  two vowels in the XML file were replaced with the decimal entities ŬĬ

6. At one place there is a precomposed upsilon with breve and acute accent, supplied from KadmosU font. This shows up in Adobe Digital Editions, but not in Apple iBooks (which uses a rather ugly Greek font in any case, ignoring the embedded font).

7. At some places, it was necessary to edit the XML files to paste in a paragraph containing only a non-breaking space, to create proper spacing between headings and the text above or below.

During this work, the command-line epubcheck utility was used at intervals, and there was never an error. In addition, at the last stage, a penultimate and a final version of the ePub file was uploaded to Lulu and both uploads were without error in the checks that Lulu’s system performs.

Is it worthwhile to create an ePub of scholarly books?

The format is definitely a significant step backwards in sophistication and capabilities in comparison with PDF or even old-fashioned “desktop publishing” from MS Word files. That is the cost of accommodating the mass market and supporting the notion of reading books on tiny screens of devices employing a relatively crippled OS.

The problems with combining characters such as underdots may mean that the format is useless for papyrological texts and for anything involving sophisticated representation of linguistic features (such as having macron or breve with other diacritics).

The problem of pagination and indexing is not trivial. Should ePub readers simply expect not to have an index with useful page references, since they can probably find some things by the search mechanism? The problem was more complicated in the case of the Kurke and Courtney books because they were repaginated reprints of previous editions. If a book is being produced from scratch, the situation will be a little different. After producing a PDF for print on demand, should one always incorporate the print page numbers into the text for the ePub version? Is there an easy and efficient way to produce an index with automated links?

For some of these questions, we await evidence of whether anyone will actually buy these books in this format. Clearly, if the sales are insignificant, it is foolish to devote the effort to creating the ePub version. Having learned a lot from the problems encountered the first time around, I was able to complete the processing described above in less than 8 hours. A few more hours were spent by an editorial assistant looking through the penultimate version to check for problems.