Aug 12, 2019

Hello, Hugo

For the past 11(!) years I’ve hosted my blog on wordpress.com, which had a very low barrier to entry. But for a year or two now I’ve been itching to move on to something that I actually enjoy working with, and that doesn’t leave me locked in to the WordPress platform and hosting services. Over the past two weeks or so I’ve been working on setting up this new blog, which is hosted on GitHub pages and is generated as a static site using Hugo. If you’re not familiar with the term “static site”, what this basically means is that I use the Hugo executable to convert source markdown files into pure HTML files that I can load onto whichever host I want, without any crazy scripting or server-side stuff to generate that HTML on-the-fly whenever a user loads the web page. From my point of view I get the following benefits:

I write the posts using markdown in my text editor of choice instead of fighting with a WYSIWYG editor (with fast live reloading provided by the Hugo local server)
The site that people see is clean and fast to load/render, with minimal javascript
I can put the generated HTML on whichever host I please
I can choose from a wide selection of themes with cool features, and customize the theme and individual posts by dropping down to custom HTML/CSS/Javascript if I need to
I don’t have to pay money just to keep my blog from serving viewers with ads that have potentially malicious code embedded in them

From now on, all new posts will be hosted here on GitHub pages. While I’m sure it’s annoying for everyone to update their RSS feeds and such, hopefully you all can appreciate the benefits on my end. I have no plans to shut down the old blog, so any old links and such will continue to work until WordPress decides to shut down their free hosting. The only thing that will change over there is that I’m going to stop paying for the ad-free version since I don’t want to keep paying for that indefinitely.

For the rest of this post I’m going to describe my experiences with migrating my blog to Hugo and GitHub Pages. If you’re not interested in that, feel free to move on. Just make sure to update your RSS feed first!

Setting Up Hugo and GitHub Pages

Hugo is actually really easy to work with, at least if you’re a programmer like me. It’s all contained in a single executable that you can download from the releases page on GitHub, so there’s no need to compile things from source or go through a multi-step installation process. Hugo has comprehensive docs that you can read through, and also has a quick start guide that can get you up and running with minimal fuss. I set things up so that I have two git repositories hosted on GitHub:

A private repo that contains all of the source markdown/theme/image files that Hugo uses to generate the static site HTML
A public repo that has the generated HTML files, using GitHub Pages to host

The private repo isn’t strictly necessary. I like it because it gives me version control on my source content, and it makes it easy to work on my blog posts across multiple computers. The public repo is necessary for using GitHub Pages, but that could easily use any kind of generic hosting service instead. GitHub Pages is nice because it’s free, and because it uses standard git functionality to push new changes to the live site. Really it’s up to you to decide how you want to manage your source content and host the final generated files.

For LaTeX/math content, I integrated KaTeX by referencing their auto-render extension. This was pretty straightforward despite me knowing next to nothing about JS or web development: the theme that I’m currently using lets you add a partial HTML file that gets added as an extra footer for every post, so I used that to reference the KaTex script. In fact I even got it working so that the script is conditionally included when the post enables a “math” config variable in the front matter, which means that the script only gets loaded for posts that actually need it. LaTeX syntax within markdown has some well-known issues that can make it a bit clunky to work with, but fortunately it’s very easy to have Hugo use the Mmark parser instead of the default parser. Mmark lets you avoid having to escape special characters like _ that frequently pop up in LaTeX, which can save some hair-pulling.

Migrating WordPress Content

This was the hard part of getting things going on my new blog. I strongly considered not converting the old posts at all, and just leaving them on the WordPress-hosted site. But ultimately I decided that it would be for the best to have all of those posts converted to a format that was more easily movable to any kind of hosting. My first attempt at converting my old posts was to run the exitwp-for-hugo python script on the exported XML from my wordPress blog. This script caused a segfault when I ran it (not a Python exception, an actual segfault!), so I quickly gave up on it. I had more luck with the blog2md script, which actually ran and produced some markdown files that I could use. blog2md uses node.js, which I previously had zero experience working with. Fortunately I had an Ubuntu installation that I could access through WSL, which made it relatively easy to install node.js and npm through Ubuntu’s Advanced Packaging Tool (apt). Despite some warnings about a deprecated package, the script produced 80 separate markdown files that roughly converted the content present in the exported XML file that WordPress produced for me. While it was great that this got me started, I unfortunately had to do a lot of manual fixing and cleanup to get things looking presentable. In no particular order, these are the main issues that I had to deal with:

The WordPress export seemed to be wildly inconsistent with how it handled newlines. I have no idea if this is because of something I did when I typed the original blog posts, but most of the time the exported XML would just contain a newline character in the text instead of an HTML line break (<br>). The library being used internally by blog2md would just straight-up ignore newlines, collapsing many of my posts into gigantic single paragraphs. I worked around this by writing a pre-process script to add a <br> wherever a newline character was found within the text body, which sometimes added more line breaks than needed. But this was better starting point than having no newlines at all.
WordPress [code] blocks did not get converted by the script, so I went through and manually converted those to markdown code blocks with c/cpp formatting.
The script was helpful in trying to proactively add the escape character (\) in cases where it would cause issues with special markdown characters like [, _, and *. While this was fine (and desirable) for normal text, it unfortunately applied this to all of my code blocks as well. This meant I had to manually go in and remove the escape characters. This was rather tedious since I couldn’t just blindly search and replace, and could only do it within code blocks.
Tables from the WordPress export weren’t handled at all, so I had to manually convert them using the extended syntax for tables in markdown.
Footnotes weren’t handled, but these were pretty easy to fix up by converting them to use the markdown syntax for footnotes.
All of the images work in the converted posts, but they just point to the images hosted on the old wordpress.com site. For now I’m just going to leave this since I don’t think it will cause any problems in the short term, but ideally I could somehow scrape all of those images so that I can re-host them and make it all easy to migrate in the case that wordpress.com ever shuts down. If anybody knows a thing or two about doing this, please let me know!
Speaking of images, a few of the posts from my series about GPU barriers used a WordPress slideshow to allow the reader to flip through the synchronization examples. This obviously can’t be converted to markdown, but fortunately I was able to quickly integrate hugo-easy-gallery by Li-Wen Yip. It also helped that I still have local copies of all of the source images that I generated for the slideshow. 😃

Comments

A functional comments system is probably one of the strongest benefits of an out-of-the-box blogging platform like WordPress or Blogger, and I’ve always liked having them for when people share additional info or find a mistake in my post. Disqus is probably the most popular option for other blogs using Hugo, but I had some strong reservations about embedding that into my site. For a while I considered simply having no comments on the new blog. These days most of the discussion tend to happen on Twitter anyway, but I still wanted to have some sort of option for people who don’t like going on Twitter¹. For now I’m going to try using utterances and see how that goes. utterances basically uses a bot to create an issue on a backing GitHub repo, and then uses the GitHub API add comments to that issue and the contents right in the blog post. This means you get the ability to post rich text, and the backing data is all stored somewhere where anyone can access it and where I can also moderate it if necessary (comment spam was huge issue on WordPress). The downside of course is that you need a GitHub account to actually comment, which is unfortunate. But hopefully that already accounts for a decent percentage of people who might want to read a graphics programming blog. 😄

That’s It

I hope you all like the new site! I’ve got a few posts on the backlog to catch up on, so hopefully I should have some actual new content coming up soon. Don’t forget to update your RSS feed if you’re using one!

I also feel like Twitter is actively hostile to archiving threads for future reference. Searching is difficult, and it’s hard or impossible to fully read through the various reply chains that invariably break off in the larger threads. And then of course at any point someone who posted in that thread could decide to delete their Twitter thread and irrecoverably break the connection between the tweets. ↩︎