Publishing a Pelican site to GitHub Pages using Fabric

Posted on Wed 19 October 2016 in Site

In the previous blog post we looked at setting up a Pelican site with Python 3 and Fabric on Fedora 24.

Now that we have a Pelican site up and running, I will show you how to publish it on GitHub Pages using a sleek Fabric task.

Review of existing approaches

There are various approaches to managing publishing your Pelican site to GitHub Pages.

Pelican authors recommend using the ghp-import tool to import the contents of the output directory to a special gh-pages git branch, which can then be pushed to the desired GitHub pages repository's branch (i.e. master branch for User pages or gh-pages branch for Project pages).

Ankur Sinha wrote an excellent article for Fedora Magazine, where he recommends creating two git repositories, the main repository containing the source of the Pelican page and the second repository containing the contents of the output directory. The recommendation is to add the second repository as a submodule of the main repository.

I didn't find any of those two approaches satisfy my needs. The downside of using the ghp-import tool is that it is not packaged for Fedora yet (review request) and that it destroys the gh-pages branch on each run, thus one is unable to keep previous contents of the page as older commits. On the other hand, tracking the contents of the output directory as a submodule in a separate git repo has the disadvantage of having to update the submodule reference in the main git repo every time a new version of the site is built, thus leading to a large number of "submodule bump" commits in the main git repo.

A new approach using a custom Fabric task

Therefore, I crafted a new approach that tries to overcome these disadvantages. Like in Ankur Sinha's article, I created two git repositories, the main repository containing the source of the Pelican page and the second repository containing the contents of the output directory. However, instead of linking the repositories via a submodule, I just created a custom Fabric task that rebuilds the source page, commits its output to the second git repository and pushes it to GitHub Pages.

To use this approach, first create the username.github.io repository on GitHub.

Then generate a clean production version of the site:

fab clean
fab preview

Add the contents of the output directory to the username.github.io git repository:

cd output
git init
git add --all
git commit -m "Initial commit"
git remote add origin git@github.com:<username>/<username>.github.io.git
git push origin master

Edit publishconf.py and remove the following line:

DELETE_OUTPUT_DIRECTORY = True

This will prevent Pelican from deleting the whole output directory, including the git repository initialized in the previous step, when building the production version of the site.

Modify the clean function in Fabric's fabfile.py to not delete the .git directory:

def clean():
    """Remove generated files"""
    for root, dirs, files in os.walk(env.deploy_path):
        for name in dirs[:]:
            # Do not recurse into this directory
            dirs.remove(name)
            if name == '.git':
                # Do not remove .git/ directory
                pass
            else:
                shutil.rmtree(os.path.join(root, name))
        for name in files:
            os.remove(os.path.join(root, name))

Add the following gh_pages function (and its publish alias) to Fabric's fabfile.py:

from fabric.contrib.console import confirm

# Get absolute path of project's root directory
env.project_root = os.path.dirname(env.real_fabfile)
# Set absolute path of project's deploy directory
env.deploy_path = os.path.join(env.project_root, 'output')

# Github Pages configuration
env.github_pages_branch = 'master'

def gh_pages():
    """Publish to GitHub Pages"""
    with lcd(env.project_root):
        # ensure the main git repository is clean
        main_git_unclean = local('git status --untracked-files=no --porcelain',
                                 capture=True)
        if main_git_unclean:
            abort("\n".join(["The main git repository is not clean:",
                             main_git_unclean]))
        # get main git repository's HEAD's sha checksum
        main_commit_sha = local('git rev-parse --short HEAD', capture=True)

    with lcd(env.deploy_path):
        # sync local GitHub Pages git repository with remote repository
        local('git fetch origin {github_pages_branch}'.format(**env))
        local('git reset --hard origin/{github_pages_branch}'.format(**env))

    clean()
    # build a production version of the site
    preview()

    with lcd(env.deploy_path):
        pages_git_unclean = local('git status --porcelain', capture=True)
        if pages_git_unclean:
            local('git add --all')
            local('git commit -m "Build of source repo @ {}"'.format(main_commit_sha))
            if confirm("Do you wish to publish the current version of the "
                       "page to GitHub Pages?", default=False):
                local('git push origin {github_pages_branch}'.format(**env))
                commit_sha = local('git rev-parse --short HEAD', capture=True)
                puts("Pushed commit {} to GitHub Pages".format(commit_sha))
            else:
                # reset the git repo to the one on GitHub Pages
                local('git reset origin/master')
                puts("Exiting on user request.")
        else:
            puts("Nothing has changed. Exiting.")

def publish():
    """Publish to GitHub Pages"""
    gh_pages()

To publish the page, run:

fab publish

and answer y when prompted.

Visit the page at https://username.github.io and share it with the World!

Setting up a custom domain

To use a custom domain with GitHub Pages, e.g. blog.mysite.com, one needs to add a CNAME file in the root of the generated site which tells GitHub Pages on which custom domain the site is hosted.

The contents of the file should match the custom domain name. For the example custom domain above, one would create the file content/extra/CNAME with the following content:

blog.mysite.com

To instruct Pelican to copy the CNAME file to site's root, list it among site's static paths:

STATIC_PATHS = [
    ... other static paths ...
    # GitHub Pages custom domain
    'extra/CNAME',
]

and annotate it with extra path metadata:

EXTRA_PATH_METADATA = {
    ... other extra path metadata ...
    'extra/CNAME': {'path': 'CNAME'},
}

Finally, change the value of SITEURL variable in publishconf.py to the custom domain's name and commit the changes:

git add content/extra/CNAME
git commit -a -m "Change site's URL to http://blog.mysite.com"

Before we publish the new version of the site with a custom domain, we need to configure an appropriate DNS record with our DNS provider. The above example custom domain is a custom subdomain, so we need to set up a CNAME record with our DNS provider that points to username.github.io. Follow your DNS provider's instructions on how to do that.

To confirm that the new DNS record is set up correctly, use the dig utility:

dig +nocmd +nostats +nocomments blog.mysite.com

and make sure the output is similar to:

;blog.mysite.com.               IN  A
blog.mysite.com.        1747    IN  CNAME   username.github.io.
username.github.io.     3547    IN  CNAME   github.map.fastly.net.
github.map.fastly.net.  650     IN  CNAME   prod.github.map.fastlylb.net.
prod.github.map.fastlylb.net. 17 IN A       151.101.12.133

If your custom domain is an apex domain (e.g. mysite.com), you will need to configure a different type of a DNS record. See GitHub Pages's Help on Setting up an apex domain.

After DNS is properly configured, publish the updated site to GitHub Pages with:

fab publish

Visit the page at your custom domain!

Note

Currently, it is not possible to set up a custom domain with SSL support via GitHub Pages. For more information, see the issue report with a lengthy discussion.

To see this implemented in practice, browse the source repo of my Pelican site.