Introducing static blogging

Introduction

This is a meta-post, a blog post about the blog. I’ve made a few changes in terms of design and implementation, hopefully for the better. The design changes were partly inspired by Matt Gemells’s post and partly by other, strongly expressed, opinions.

On the implementation side I’ve managed to remove the dependency on jQuery to layout the page elements, despite the best efforts of CSS to make a simple design almost impossible to implement. Dumping jQuery makes a big difference on total page download size, it’s a 90KB library when the content itself may only be 20KB. It also means the site will render correctly with JavaScript disabled.

But the biggest change is that the blog is no longer driven by Wordpress.

What’s wrong with Wordpress

I’m not criticising Wordpress in itself, it’s obviously a very powerful tool and a very successful project, but there a few things that don’t suit me. One is that posts are held in a database on the server, rather than as files on my local machine. Another is that requires some maintenance, keeping the software and add-ons up to date, worrying about brute force attacks on wp-admin etc. Lastly, some authoring tasks were taking longer than I’d like, and I ended writing a lot of raw HTML in posts which itself was vulnerable to future design changes. Possibly, there are plugins to help with the last part, but becoming a Wordpress expert is a task in itself.

I was attracted to idea of static blogging. With this model all of the site generation for the blog is done my local machine and the output pages uploaded to the server. These pages are then served as simple HTML files and so don’t require extra executing code on the server. This avoids all the additional admin and security issues and it’s also more efficient if server load is an issue. There are whole categories of things you can’t do with static blogs, but for a very simple site like this they don’t apply.

For the static blogging software I had a look at Octopress but it failed to install on my machine, so I decided,“Screw it, I’ll write my own, how hard can it be?”.

DmBlog

What I’ve come up with I’m calling DmBlog for lack of a better name. The code[1] is really at proof-of-concept quality level but it works well enough. I’ve now migrated my whole blog, and what you’re reading was generated by the new system.

DmBlog supports:

  • Post to post navigation
  • Lists of posts, e.g. main page, category pages
  • Reasonably smart media handling
  • Atom feed generation
  • A markup language which takes some of grind out of HTML

The blog is defined as a group of directories and files. Each post is modelled as a directory containing a file with the page content and meta-data, and any associated local resources (images and audio). Additionally there are other directories describing categories and series.

Categories and series

DmBlog supports categories, and you have the normal list of posts by category, and an Atom feed for the category. But it also supports the idea of series as well.

A series behaves much like a category but with a strongly implied reading order. For example, my E-M5 vs X100S posts are declared as a series and on each page DmBlog will generate a navigation box to move around the series.

Markup

The markup language, named “Short” is an unholy combination of XML, LaTeX, TWiki, Markdown and some original invention. I can write all content in valid XML-formatted HTML, but the additional features in Short can save a lot of time.

For a complete example you can view the source of this page.

Pre-XML

The main processing is all done in the XML domain as this saves me having to write a real parser. But I did want to save some of the authoring cost of XML to do with escaping.

Particularly when writing about programming the need to escape the special characters of <, > and & becomes an issue. In pure XML/HTML to output <html> I have to type &lt;html&gt; which becomes very irritating, very quickly[2] and is painful for large sections of code.

To help with this the first stage of parsing is some pure text processing, i.e. not XML markup aware, to escape special characters. Anything between braces, { and }, are automatically escaped. Single-braces escape whilst double-braces escape and tell Short render as code. Braces may seem like a terrible choice as they themselves appear frequency in code, but this is solved by adding a label adjoining the braces[3]. The following will process correctly:

CODE{{
if (a_test)
{
  do_something();
}
}}CODE

Once this stage is complete there’s a well-formed XML string than can be parsed.

Attributes as content

One of the aesthetic issues with XML structures are elements with lots of attributes which can become hard to read. For my invented <image> element, more later, as well as src, title and alt there are href and caption attributes which would lead to some long lines. XML does allow you to break an element over multiple lines, but the source still looks a mess.

As a form of syntactic sugar I’ve allowed certain elements to have their attributes in the element body, which is much neater. The element contents are converted to normal element attributes as part of the parsing.

<image>
src:     http://.../small.jpg
href:    http://.../large.jpg
caption: A picture, click to enlarge
title:   Picture of something
</image>

Implicit and simple markup

Short will scan the text elements in the document for simple non-XML markup. This kind of markup is significantly quicker to enter, if less flexible.

For example, for a section header the XML is <h2>Title</h2> but you Short will also detect and translate the Markdown/ATX style of ## Title or the TWiki style of ---++ Title. Equally Short will detect the TWiki style of bulleted and numbered lists. Other text which is not in any kind of element is presumed to be a paragraph.

This all means that input:

## Dogs
List of big dogs:
   * Rottweiler
   * Mastiff

will produce the following HTML:

<h2>Dogs</h2>
<p>List of big dogs:</p>
<ul>
  <li>Rottweiler</li>
  <li>Mastiff</li>
</ul>

Short also recognises some LaTeX style punctuation, for example:

``Run!'' he shouted --- too late --- as 100--200 badgers attacked

&ldquo;Run!&rdquo; he shouted &mdash; too late &mdash; as 100&ndash;200 badgers attacked

which renders as:

“Run!” he shouted — too late — as 100–200 badgers attacked

New elements and attributes

As well as making it easier to produce HTML elements, Short contains new elements and new attributes on existing elements which can represent high-level behaviours. When the document is ‘rendered’ the elements are converted to arbitrarily complex HTML, but at the same time other processing and checks can be done.

I’ll mention a few of the new behaviours below.

a

Short adds the new attribute ‘to’ to the HTML <a> element. Targets specified in to are processed by the renderer, allowing new forms of addressing.

The address blog: some-name refers to the blog post with the specified key, and the series prefix references a defined post series.

<a to="blog: introducing-static-blogging">This page</a>

footnote

The footnote element generates a footnote with the enclosed text, with a marker at the original position of the element.

Other systems can do this as well<footnote>I never said they couldn't</footnote>.

Other systems can do this as well[4].

image

The <image> element is an enhanced version of the standard HTML <img>. As well as displaying the image itself it will add presentation <div> elements and optionally a caption and also make the image a link.

For my photography posts I normally want to use images hosted on my Smugmug account to save bandwidth and server space. This previously involved a lot of manual HTML generation, and it was quite fiddly and time consuming. Using <image> takes away a lot of the effort.

The <image> renderer is aware of the page dimensions and will add new rows as required. Ideally this would all be done in pure CSS, but that wasn’t a battle I was ready to face. In the future if I want to make the change to use pure CSS, or other design changes, I just have to change the renderer implementation and reprocess the site.

<image>
alt:     Trees in fog
caption: Ghostly vegetation
href:    http://photo.duncanmartin.com/photos/i-r84GQWj/0/O/i-r84GQWj.jpg
src:     http://photo.duncanmartin.com/photos/i-r84GQWj/0/M/i-r84GQWj-M.jpg
title:   Trees in fog
</image>
Trees in fog

media

The <media> element is allows the inclusion of audio data, it may eventually be expanded to other media types[5]. This element will create the presentation <div> elements, the HTML <audio> element and an optional caption. It was also convert the original audio into mp3 and ogg to maximise browser compatibility.

<media>
src: audio/17-open.mp3
caption: Olympus 17 burst - F1.8
</media>
Olympus 17 burst - F1.8

Final steps

After all the processing is complete the render adds <section> tags to improve the semantic description of the page. It also adds unique IDs to the sections to allow direct links and the ‘next section’/‘previous section’ navigation arrows.

What’s next

I’m going to live with DmBlog for a little while, see what works and what doesn’t.

Longer term I’d like it form the basis for a more flexible system that could model more of my site. At the moment duncanmartin.com pages are one of three types: purely static pages, blog pages generated by DmBlog and the music quality survey pages which are script generated. The purely static pages are the most annoying to maintain as any changes I want to make site wide require either a lot of manually editing or a custom script to manipulate the XHTML.

There are also other features I want to build in such as detecting dead http:// links at processing time and automatically validating the final HTML document.

  1. Around 150KB of Perl [Back]
  2. And to output &lt;html&gt; to make that point the syntax would be &amp;lt;html&amp;gt; and so on [Back]
  3. The nice thing about regular expressions is that kind of detection is as easy as saying:
    $text =~ /\G(.*?\s)?(\w*)(?<!\\)({((?<!\\){)?)/gcso
    
     [Back]
  4. I never said they couldn’t [Back]
  5. You could argue that images should also be rendered by <media> [Back]