Static site generation

A few months ago I described how I switched this blog from Wordpress to my own static blog generation tool called DmBlog. DmBlog was quick and dirty but worked really well, and was a big improvement in usability over Wordpress[1].

I also found the generated site method really useful for making design changes easily and performing invisible tweaks to the output HTML. I wanted to extend DmBlog to also cover the non-blog sections of my site.

Failure case

DmBlog worked, partly, because it was hacked together. I didn’t make any effort to make it a longterm design, or worry about cleanliness of implementation. It got done, it did what it needed to and that was it. Although I did need to tidy up the design for the rewrite I initially went too far towards making it a generic/re-usable tool.

In particular I tried to make the HTML generation itself be defined purely by configuration and templates, i.e. the Perl code itself would have no knowledge of how the site is structured or contain any HTML design parts. I started defining a mixed template/logic control language in XML syntax (like XSLT) which turned out to be a terrible idea (like XSLT). It was miserable to write and it became clear that the model would struggle to represent even the current site behaviour cleanly, and would likely fall apart on any kind of real change. Even if it had worked, if I came back to make some changes in six months I never would have remembered the fine details of how it worked.

At this point the rewrite was becoming a big chore and ground to a halt. When I came back to it I freed myself from the “no site knowledge in Perl” requirement and everything became much easier[2]. The result is a new tool called DmSite, which is still much cleaner than DmBlog but is still pragmatic.

What’s new

Inputs and outputs

DmSite has a cleaner directory structure consisting of src, build and output. src holds the files I create, both the category/series definitions and the content (pages, media files, CSS etc.)

The src files are processed by DmSite to populate the output directory which can be directly synced to the remote site. build contains intermediate files generated by DmSite.

The directory structure of src is approximately replicated in output. In the case of the blog part, each post is directory under src/content/blog but the output path is determined by the declared publication date, so src/content/blog/test-post may become output/blog/2014/01/test-post.

Entries under content may change names or types as they are processed, as discussed in the rendering section below.

Pages and markup

As with DmBlog the page content itself is represented as simple files which look a bit like this:

title: My first page

---

---++ Introduction
Hello world!

There are two concepts in that simple example. Firstly, the idea of the ‘Page’ which is the structure of the file with the meta-data top, a dividing line, followed by the page content. Secondly, the ‘Markup’ language of the content. While in DmBlog these concepts were mashed together now there’s a better separation.

So far I’ve only implemented one form of Page (‘Basic’) and one Markup (‘Short’). But the design does mean it’s relatively easy to introduce new markup languages within Basic, I’d just have to add a piece of metadata to the top to identify the markup language in use.

There may be reasons for other kinds of Page classes, either because I come up with something better than Basic or because I have source data is already structured. This could even be something that isn’t obviously written content, such as a set of data files, which could then be translated into HTML.

This abstraction relies on an intermediate format in XML. The format is a mixture of HTML and some extra tags, under a new namespace, for the new features like footnotes and the easy-to-use image tags previously seen in DmBlog. The Page class supplies the intermediate format to the page rendering stage.

Rendering

Each file in the src/content directory is rendered. In some cases the rendering is simply a copy operation, e.g. for zip file downloads, but the model allows arbitrary work:

  • Audio files - Translated to mp3 and ogg formats
  • CSS - Minification
  • JavaScript - Simple minification
  • JPEG - Optimised with jpegtran

The most complex rendering is for pages, where the intermediate XML format is translated into HTML. DmSite actually supports several renderers for pages; the full version used for the Website itself, the simplified version that appears in the Atom feeds and a very simple version that generates the summaries that appear in index pages. In each case the renderer has to decide which parts of the intermediate XML to implement or remove.

The optimisation and minification functions help with client download speeds, and are form part of Google’s site speed score. This speed scores may in turn affect the Google search ranking for the site.

Conclusions and further work

This site is now fully rendered by DmSite, so it does at least work.

I still need to more to finish optimising for client performance. There is currently no minimisation of the generated HTML or PNGs. The JPEG optimisation works, but only for images hosted on my site, where many of photos are hosted directly on Smugmug.

I’m also considering creating some kind of built-in support for creating pages relating to coding projects. For example, automatic generation of listings of available downloads, incorporation of downloads and news into a single Atom feed.

  1. For me at least, I’m not suggesting this would be a universally held feeling [Back]
  2. Which makes sense, Perl is a better programming language for manipulating text data then I’ll ever invent [Back]