Skarb 💎

How exactly are Kindle dictionaries made?

The format for Kindle dictionaries is somewhat documented by Amazon on this page. By the looks of it, the format has remained fairly consistent across the years and across Kindle versions.

In its simplest form, a Kindle dictionary is a collection of:

In addition to these, it’s possible to add CSS files for optional styling of the dictionary contents.

Amazon provided a command-line binary utility called kindlegen that would compile and compress the dictionary building blocks (cover image, OPF manifest, HTML content) into a .MOBI file. kindlegen was discontinued sometimes in 2020 and replaced by the Kindle Previewer, which has a graphical interface for testing and compiling new ebooks but unfortunately is not supported on Linux machines (which I tend to use whenever I work on a development project). Hence I ferreted out an old copy of kindlegen from a dark corner of the Internet and I’m using that as part of the project.

How is Skarb made?

To generate the corpus I use the following sources:

I then use a small Python script to: