Setting Up Pelican - The Python Static Site Generator

2016-01-01 Last updated 2018-05-28 By Edgar Hassler

I had a debate with my friend David about how to set up my personal web site. My goal was to publish more of the things I'm working, and to do it wtih static pages so that it's fast. I wanted the ease of use that accompanies things like Wordpress, while on the other hand I wanted performance and easy customization. In retrospect I was willing to put in way too much effort to achieve this, but such is life.

David kept pushing me to use Jekyll which I didn't want to do because Ruby. I'm willing to put up with some weird things in programming languages, but I can't get on-board with functions having a single argument not using ('s or )'s. That is a bridge too far, friend! I chose Pelican because it seemed like a nice python equivalent, it was mature, and pelican's are cool birds overall.

Originally I used a software called MarkdownPad 2 to edit my markdown. That software allowed for integrating custom headers into the rendering, so I was able to include MathJax for rendering things like $\int f(x) \mathrm{d}x$ as well as my own LaTeX templates that I like to use. The down side was that I'd do a lot of work rendering and wrangling graphics. The overhead of this chore (and perhaps an NDA or two) was enough that it discouraged me from blogging too often.

An alternative approach I had seen people use was using Jupyter Notebook and nbconvert to automate converting notebooks to blog posts. I settled on this approach since it made creating/editing posts far more easy. Individual images are automatically base64 encoded into pages. Vector graphics from graphvis get embedded as SVGs. Things just work out well.

An additional improvement to my setup was dockerizing my workflow. Pelican is biased towards linux and macOS systems and, while you can make things work on Windows with alternatives, it's just kind of conceptually unpleasant. Docker gives me a rock solid environment that's portable. It is icing on the cake that I can use Jupyter terminals to change the environment in real time.

Docker Container

My strategy was to create a docker container and mount my Pelican content into it. Then, I would use Jupyter to edit my content and for terminal access into the instance to make changes. The nice thing about Jupyter and Docker is that I can install things, generate the notebook, then reload the container and keep the work without having my system poluted by whatever I was working on.

I created a Dockerfile to hold the installation information as follows:

FROM python:3-slim
COPY ./requirements.txt /root/requirements.txt
COPY ./entrypoint.bash /root/entrypoint.bash
COPY ./jupyter_notebook_config.json /root/.jupyter
RUN apt-get update \
    && apt-get install --no-install-recommends -y \
        make openssh-client ca-certificates openssl git graphviz\
    && pip3 install -r /root/requirements.txt \
    && jupyter contrib nbextension install --system \
    && jupyter nbextension enable codefolding/main \
    && jupyter nbextension enable hide_input/main \
    && chmod +x /root/entrypoint.bash
WORKDIR /pelican
EXPOSE 8000
EXPOSE 8888
ENTRYPOINT ["/root/entrypoint.bash"]
CMD ["devserver"]

This all runs off an entry point bash scripts as follows:

#!/bin/bash

function waiting {
    make $1
    if [ -f "srv.pid" ]; then
        while kill -0 "$(cat srv.pid)"; do
            sleep 0.5; 
        done
    fi
}


jupyter notebook \
        --allow-root \
        --no-browser \
        --ip=0.0.0.0 \
        --port=8888 \
        --notebook-dir=/pelican/content/blog &


case $1 in
    devserver)
        waiting $1;;
    serve)
        waiting $1;;
    serve-global)
        waiting $1;;
    html)
        make html;;
    clean)
        make clean;;
    regenerate)
        make regenerate;;
    publish)
        make publish;;
    ssh_upload)
        make ssh_upload;;
    bash)
        /bin/bash;;
    *)
        echo "Unknown"
esac

Some points of interest with this script. First, I need the --ip=0.0.0.0 argument to make Jupyter run. Leaving out this argument cases a failure to bind to the network, and using 127.0.0.1 makes it inaccessible.

To make this all work then I build the container with docker build -t pelican . in Linux or docker build -t pelican %CD% in Windows. Then I run the container with

docker run -v ${PATH_TO_PELICAN}:/pelican -p 8000:8000 -p 8888:8888 -d --rm pelican devserver

Custom Theme

I put together a custom theme with a few features I wanted. Pelcian uses Jinja 2 for its template language. Blog posts have metadata sections at the top that gets passed into the templates. This contains things like titles, publication dates, and tags for the page, but we can also pass in anything we want. I added a "warn" field that causes the template to (in large red hilighting with bold font) notify the reader if the document is a draft.

As an example, I wanted to enable comments on a per-post basis, so I added some templating to include Disqus when the comments metadata value was set to true on a post:

{% if article and article.comments and article.comments == 'true' %}
    <!-- Disqus article -->
{% endif %}

Jina2 was a joy to work with. It's not as easy as something more native like PHP but it has a certain elegance and non-PHP to it that I appreciate.

Pelican Plugins

Pelican has a pelicanconf.py and publishconf.py that configure the build process, including what plugins to use. Pelican has a pretty nice spread of plugins but I only use a couple of them, specifically:

  • pelican-cite: BibTeX citations like natbib.
  • render_math: MathJax package.
  • series: Allows you to make an article part of a series, and maintains the series manifest at the bottom of each member page.
  • related_posts: Allows you to target related posts and produces links to those posts at the bottom of the screen.
  • share_post: Creates a set of sharing icons for social media services.

In addition, I wrote my own plugin to assist in converting Jupyter notebooks into embeddable HTML for my site. There are pre-existing plugins that do this, but I found them lacking in producting embeddable output and honoring the nbextensions.

Pelican Cite

The pelican-cite plugin allows you to use [@name] and [@@name] to pull in citations in the style of natbib for LaTeX, and is uses your bibtex database. I downloaded the plugin put it into a plugins folder next to my content. A note about pelican-cite: It will not run on drafts, just published articles. This gave me a few hours of difficulty. Also, you have to maintain a central BibTeX database, whereas I'd rather have BibTeX entries in each post. But it works!

Custom Plugin for NBConvert

The nbconvert template for nbextensions is based off of the "full" tempalte instead of the basic, so I copied the template and made the first line {%- extends 'basic.tpl' -%}.

To construct the pelican plugin I took pelican-ipynb and stripped out a lot of the code. This left me with

import logging
import os
import json
from pelican import signals
from pelican.readers import BaseReader
import nbformat
from nbconvert import HTMLExporter
from traitlets.config import Config


_TEMPLATE = os.path.join(os.path.dirname(__file__), 'edgar_ipynb.tpl')


def register():
    def add_reader(arg):
        arg.settings["READERS"]["ipynb"] = IPythonNB
    signals.initialized.connect(add_reader)


class IPythonNB(BaseReader):
    enabled = True
    file_extensions = ['ipynb']

    def read(self, filepath):
        metadata = {}
        metadata['ipython'] = True
        ipynb_file = open(filepath)
        notebook = json.load(ipynb_file)
        notebook_metadata = notebook['metadata']
        notebook_format = notebook['nbformat']
        for key, value in notebook_metadata.items():
            key = key.lower()
            # Custom keys are copied too
            if key in ("title", "date", "modified", "category", "status", "tags", "slug", "author", "warn", "comments"):
                metadata[key] = self.process_metadata(key, value)
        keys = [k.lower() for k in metadata.keys()]
        if not {'title', 'date'}.issubset(set(keys)):
            raise ValueError('Could not find "title" or "date" in metadata for {}, but did find {}'.format(filepath, repr(metadata)))
        c = Config()
        c.HTMLExporter.preprocessors = [
            "jupyter_contrib_nbextensions.nbconvert_support.CodeFoldingPreprocessor",
            "jupyter_contrib_nbextensions.nbconvert_support.PyMarkdownPreprocessor"
        ]
        nb = nbformat.read(filepath, as_version=notebook_format)
        html_exporter = HTMLExporter(config=c)
        html_exporter.template_file = _TEMPLATE
        content, resources = html_exporter.from_notebook_node(nb)

        return content, metadata

I added these two files in an edgar_ipynb directory in the pelican plugins folder and was off to the races.

Conclusions

Overall this method isn't perfect either but it's a lot easier than my previous attempts. A previous version of this post discussed using Docker more as a tool to modify the source than as a server for the host, but there's a lot of boilerplate required and not an easy cross platform way to do it. I had originally attempted to do this all through WSL on Windows so that everything could be bash scripted, but running docker from WSL requires enabling a remote connection to Docker on Windows, and then all the paths are relative to the Windows Docker Server. This is problematic because mounting volumes causes a permission request that you have to accept, and a second request for a password that stayed hidden for me and made it look like everything had stalled out.

Anyway, this is good enough for now.