Book, Open Source

Generating ePub Books From HTML

Converting a single HTML file to an ePub is straightforward, with many free tools available for this purpose. But, if your goal is to convert multiple HTML files, and only a portion of each file, into an eBook with a proper table of contents, cover image, etc., what do you do?

This was exactly the crossroads I found myself at when attempting to create an ePub version of my book. Each chapter of the book was represented by a unique web page, and I needed an automated way of quickly downloading all of those and combining them into an eBook. To make things more interesting, only a portion of each page was necessary – who wants to see a web page’s header, footer, and navigation bar on an ePub? Additionally, images needed to be downloaded and embedded into the ePub, and Github Gist code snippets needed to be downloaded and represented without the use of Github’s Javascript tags.

All of these requirements are necessary for creating a professional ePub, but yet surprisingly no tool existed which could do all of these things without considerable manual effort. Like any good software developer, if no tool exists for a job, and the only other option is manual work, I took the laziest path and created a new tool to get the job done.

Introducing html2epub

That new tool is called html2epub and is a command line app which can:

  • Generate a professional looking ePub from a series of web pages
  • Strip out unnecessary HTML
  • Convert HTML into XHTML as to be compliant with the ePub spec
  • Embed images
  • Embed Gist code snippets
  • Rewrite chapter to chapter links for proper ePub navigation
  • Support for Table of Contents navigation
  • Support forms-based authentication

I have tried to keep this utility as simple to use as possible, despite its many features. Let’s look at how to get started.

Getting Started

On macOS installing html2epub is greatly simplified by brew. Simply run:

brew install jwhitehorn/brew/html2epub

This will download and install htmlepub, and its dependencies, and register the command in your PATH. With that completed, you can generate an ePub as easily as:

html2epub --url https://www.datasyncbook.com \ --toc ./example/toc.xhtml \ --cover ./example/cover.png \ --contents ./example/contents.json \ --title "Data Synchronization" \ --subtitle "Patterns, Tools, & Techniques" \ --author "Jason Whitehorn" read more