Gatwood Publishing

Publishing The Hard Way Part Part II: Electronic Publishing (Herein lies the path to madness)

The fun thing about producing a book in electronic and paper form is that you get to do all the formatting twice. If you’re lucky. In reality, you’ll do it far more than twice. However, most of the formatting work falls into one of two major buckets: formatting for electronic delivery and formatting for dead tree (paper) delivery. This section is all about the electronic delivery. I’ll save the dead trees for later.

In the minds of a lot of readers, electronic books cost nothing to make, and so should be nearly free. Most people think, “It’s no different than making a web page, and anybody can do that.” So why, then, do eBooks cost a lot of money? What makes them so tricky to produce that publishers spend more time and effort on them than on their print editions?

I’ll explore the pains of producing a high-quality eBook that attempts to replicate the design and feel of the print editions as much as possible in this article, the second in my series of articles on publishing the hard way. (Stay tuned for articles about other subjects, including affiliate programs and print publishing. And if you missed it, be sure to read Part I: Writing Over the Long Haul—What Went Right And What Went Wrong.)

Kindle is not a Kindle
How can devices made by one company be so inconsistent?

As an author, I have sort of a love-hate relationship with Kindle. On the one hand, it is the most popular reading platform out there, at last check. This is in large part because it has a fairly rich set of end-user features, and in large part because Amazon provided a complete infrastructure, from publishing tools to sales and distribution to back it up, and did so long before anybody else did. So as an author, you have to love the platform for its popularity.

On the other hand, its consistency often leaves something to be desired. Amazon has a long history of basically forking their code for each new device, and not keeping the code in sync so that they all behave the same way. Their built-in styles vary from device to device, and the way that they handle content also varies. For example, it took a lot of trickery to produce an EPUB that can show white-on-black text on most devices. Even when I got it working, it still showed up as black-on-white text on a few buggy, old devices (and, until they fixed the bug at my request, as black-on-black in their “Look Inside” feature).

Different devices use different base font sizes and scaling, so points/pixels aren’t points/pixels (what standards?), and everything has to be done in ems. This entire design makes little sense, as most rendering engines have no trouble doing relative font scaling even with hard-coded font sizes, but apparently nobody told Amazon that. And Amazon has lots of arbitrary CSS handling rules that some readers enforce and other readers don’t. For example, Amazon’s publishing guidelines specify that the line must be at least 1.2em. With some fonts, on some readers, this can go very wrong, resulting in wildly varying baselines in ways that IMO grossly violate the CSS spec.

And older Kindle devices use a different underlying file format than newer Kindle devices, which means your content gets interpreted very differently depending on hardware. Newer devices handle CSS and generally work a lot like modern EPUB readers, but with a lot of odd quirks. Older devices ignore CSS, and rely on a tool called “kindlegen” to translate that CSS into basic HTML (using tags for bold, italic, font changes, and so on). Because HTML tags can’t support the full feature set of CSS, a lot gets lost in the process.

Worse, the kindlegen tool is rather amazingly buggy in the way it interprets CSS, often requiring you to dumb down the rules severely just to get correct results. If you want kindlegen to work, you should never use more than one element in your selectors, and you should never use more than one class on any element. For example, in my EPUB styles, I use a style that looks like this:

    blockquote p + p {
        text-indent: 1.5em !important;
    }

To prevent kindlegen from choking, I had to dumb that style down to this:

    p.kindleblockquotefirstinsection {
        text-indent: 0 !important;
	...
    }

where the first paragraph is explicitly tagged with a class. As the rules became more complex, this resulted in such lovely names as p.centernoindenthalfspace. You can imagine just how much I love this platform....

The reason for this is that when kindlegen translates p + p, it tries to apply it to every paragraph tag, because it applies it to the first tag and the second tag relative to the first, instead of just to the second tag relative to the first. In many cases, this doesn’t matter, but in many cases, it does, resulting in very bizarre-looking output.

The newer Kindle devices are more consistent than the older ones, so if you are really clever, you can wrap many of your styles in media queries to limit them to only KF8-capable devices, and avoid many of these problems. If Amazon would spend the engineering resources to update as many of their readers as possible to current versions of their software, the Kindle platform would be a decent platform to work with. Right now, it hurts. A lot.

Probably the biggest problem with the Kindle platform is how crippled their iOS reader is. It does not fully handle the KF8 format, instead behaving like a curious hybrid that is almost like the e-paper readers, but not quite. I find it baffling that the world’s most popular reading hardware platform has the worst support from Amazon. This could explain why Apple’s iBooks is so rapidly gaining ground.

Additionally, I’ve found bugs that crash the Kindle Mac reader reliably. I reported at least one of these bugs a year ago, and they haven’t updated the Mac reader in all that time. So don’t count on fixes if you encounter bugs. (And I won’t even mention the possible security ramifications of such bugs.)

In short, the Kindle platform is a mess, and even after you get things working, you’ll find yourself constantly worrying that Amazon is going to turn around and break something else, either accidentally or on purpose, resulting in insane amounts of re-engineering and re-design. If your books are really simple, you probably won’t care. If you’re doing drop caps, or anything else remotely complex, you should be prepared for a lot of pain. And if you aren’t a software engineer, you should just avoid doing more than the most basic styling if you can.

And then, there’s KDP.... More on that later.

Lord, Come to my ADE.
Why is this piece of junk ignoring all my styles?

Adding to the pain of every eBook creator is a piece of software called Adobe Digital Editions, or ADE for short. This application is relatively unknown to most users, yet a decent percentage of people who read eBooks actually use ADE indirectly, in one of many skinned varieties distributed by companies like Barnes & Noble (Nook) and Kobo. These vendors often add their own bugs on top of ADE’s existing bugs.

What makes Adobe Digital Editions so insidious is that at first glance, it looks like a well-behaved eBook reader, until that moment where it suddenly isn’t. You put in an innocent anchor tag to serve as a link destination, and you put text inside it, and... wait, why is this underlined and blue? You make a small typo in a CSS file, and... wait, why is this reader completely ignoring my entire stylesheet? You use an “at” rule that ADE doesn’t understand, and it ignores the entire stylesheet again. You use margin: auto to center something, and the content remains left-justified. And so on.

And before you ask how a reader is supposed to handle CSS that didn’t exist when it was first created, I’d just to point out two things. First, ADE is frequently updated. Second, the HTML spec contains very specific instructions for how to handle unknown CSS, including how to handle failures. This leads me to wonder if Adobe actually read the spec, because their reader pretty much ignores that entire section from top to bottom.

Heaven help you if you try to use a self-closing XML tag. ADE sometimes decides not to close the tag, so the tag stays open until the end of the file. (Wait a minute. Why is the last half of my chapter underlined, and in blue?) If ADE were parsing straight HTML, that might make sense, but EPUB books use XHTML, which is XML. And self-closing tag support is mandatory for all working XML parsers. Didn’t anybody at Adobe re... you get the idea. (It looks like Adobe is incorrectly applying the HTML5 parsing rules to XML. Those rules are, IMO, broken by design, but that’s another discussion for another day. Either way, applying them to XML is quite clearly incorrect.)

And then, there are the spots where the EPUB 2 specification codifies ADE’s bad practices. For example, the entire concept of a reader treating margin: auto as though you had specified margin: 0 is horribly broken. Yet for some reason, the EPUB 2 spec allows it, and ADE does it. Thankfully, I’m pretty sure that this brain damage is absent from the EPUB 3 spec, so if and when Adobe actually gets around to fully supporting EPUB 3 in ADE, at least that particular design horror will finally die, along with the need to use piles of SVG to work around that limitation.

Ow, My iBooks
And would you quit shoving me down?

Before I complain too much about iBooks, I should start by saying that I found iBooks to be probably the most well-behaved reader platform that I had the pleasure of working with. This came as no surprise, given that it is built atop the same WebKit engine that powers the Safari web browser, and that morphed into the Blink engine used in Chrome. Then again, I’d have expected Kindle to be well-behaved for the same reason, yet it isn’t.

iBooks does have a few quirks, of course. The most obvious quirk is that you have to add an extra file, or else it ignores large chunks of your stylesheet, such as embedded fonts. (This may not be required in newer versions.)

iBooks also uses some magic WebKit properties like -webkit-line-box-contain in ways that can wreak havoc for complex layouts (such as drop caps) if you don’t override them. Because there was no desktop version of iBooks at the time, it was also not practical to examine the stylesheet that it used, so I ended up pinging the iBooks team with a bug report to find out how to fix that one. (Thanks for your help, BTW.)

iBooks also had a lot of trouble dealing with SVG text layout prior to iOS 8 and OS X v10.10. There are ways to work around the problem in older versions, but they involve serious surgery to your SVG. I wrote some filtering code to do the fixup for me, and if you still care about older versions of iBooks, you can find that code on the GP EPUB Tips page.

Finally, iBooks is constantly evolving, and periodically adds new things into its stylesheet using universal selectors. This can result in nasty surprises when new operating systems come out. Fortunately, their releases are fairly predictable, and their support tends to be mostly consistent across hardware (even between Macs and iOS devices).

Where Do We Go From Here?
Figuring out all of these quirks by ourselves would be inconceivable.

By the time you read this article, everything in it will be out of date. So where can writers go to figure out how to fix problems? I would recommend the MobileRead forums. Their EPUB and Kindle forums are filled with knowledgeable people who are always willing to help people find ways to work around problems.

Additionally, their wiki site has pages on EPUB, MOBI, and KF8 that provide page after page of very specific tips on HTML, CSS, and overall content design that are invaluable in navigating the muddy waters of eBook creation. Whenever I discover a new reader quirk, I post it on one of those pages, and I encourage others to do so as well, so that it can serve as a nearly complete, up-to-date resource for folks trying to create eBooks.

Either way, count on spending about 10% of your total content design time actually making the content work correctly on a sane web browser (e.g. Safari or Chrome), and the other 90% working around bugs in various readers. To give you an idea of how bad the current state of the eBook reader world is, the three links above contain, in total, about ten pages of errata (bugs), at about two or three lines apiece, on average. That’s a lot of issues to work around. I seriously would not recommend that any non-programmer even attempt to create eBooks by hand. You’ll go nuts. And even with the commercially available tools out there, things don’t always work as well as they ought to (at least from what I’ve read).

Someday, when all the readers support EPUB 6 or 7 (consistently), maybe things will be standardized enough and flexible enough for eBooks to not be such a living nightmare, but for now, with the standards we actually have today (EPUB 2 and 3) and the devices that support them today, they are.

So if you’ve ever wondered why eBooks tend to be priced almost as high as the print books, it’s largely because if you want to actually do a good job, they’re harder to produce than print books. And because fewer people read the electronic versions of most books, that initial design cost has to be spread over a smaller number of copies, largely canceling out the benefits of lower manufacturing and distribution cost, at least until you get in to large-volume titles.

Keep reading in Part III: Electronic Release (It really shouldn’t be this hard).