August 15th, 2009

Earth from space

When worlds collide

My day job is working as an IT manager for a large legal and regulatory publishing company. My passions are writing spec fic, reading spec fic and other genres, and (recently) learning about ebooks. Since I got my Kindle, I have become convinced that digital publishing is the imminent future of recreational reading. By "imminent," I mean that I believe over the next twenty or so years, ebooks will eclipse print books. Note that I said "eclipse," not kill. I think some kinds of books will be printed for a lot longer than that.

My two worlds of how I earn my living and how I spend my recreational time collide in one simple fact: ebooks have a lot more errors in them, compared to print books— typos, weird formatting, run together paragraphs, and so on. Seeing an error like "hesi- tant" gives me a strong sense of déjà vu, because I deal with those kind of workflow problems in my day job. The problem is that content is being authored in one system and produced (printed) in another. The author writes a book in Word or other software. The editor might edit the m.s. in the same software, but when it's ready to become a book, it gets loaded into layout software, like Quark or InDesign. At that point, there are two versions of the book.

Once a book is in layout mode, changes are made to that version. If a word at the end of a line needs to be hyphenated, the layout folks insert a hyphen and a line break (or possibly the software does it automatically). Then that layout version is used to generate a PDF to send to a printer. And in many cases, that PDF is used to create the Kindle or other ebook version. PDF is not ebook-friendly. Sometimes the conversion software can't tell where the pragraphs break; sometimes it has problems with hard hyphens inside words.

But if the publisher provides the original word-processing (or maybe RTF) version to Amazon or other ebook producers, last minute corrections to misspelled words won't have been fixed in that version. This conundrum was faced years ago by information publishers as they moved to web and CD-based publishing. The answer is content management systems where the data (books, even novels, are data, too) reside in a database, in a format such as SGML (standard generalized markup language) or XML (extensible markup language). SGML and XML files can still be easily edited and can also be used to produce any format— print or digital. In additon to reducing errors, this can also make it easier for the ebook table of contents entries to work as links.

In other words, it's time for book publishers to move into the 21st Century.

freehit counter