Metadata, architecture, and complete site rebuilds
I know I said last post that I was going to write about TDC's development but this post is not about that. Instead, it's about an issue I found with this blog and decided to resolve, before realising the issue extended to the entire site, was entangled with several other issues, and would require essentially an entire rewrite.
The initial issue in question was search engine indexing. I wanted my blog posts and project pages to be accessible directly through search engines, but the current architecture of the site didn't have that capability.
For context, this site is a single-page application. This means that movement between unique 'pages' (I call them views) should not cause a load in the user's browser, and should instead dynamically load/show/hide content as needed to provide the content the user is requesting.
My initial method of doing so was using page anchors (#something
affixed to the URL) and transitioning elements on an anchor change. For those not in the know, anchors are typically used for navigating around a page or holding your place in a longer document. The wikipedia contents section is a good example of this.
While anchors worked, and provided the single-page functionality I wanted they had a few issues:
- They were not crawled by search engines
- They didn't look as nice as a standard URL
- Sort of a sub-issue but they couldn't be sectioned super well.
/blog/a-post
looks a lot better than#blogpost_a-post
- Sort of a sub-issue but they couldn't be sectioned super well.
- Their usage meant normal usage of page anchors needed special handling
- The server has no knowledge of the client anchor, meaning more logic needed to be implemented client-side
After some thought and discussion with people more experienced and myself, I had a plan to remedy all the above: PHP. Only issue is, I had never done it before and felt way in over my head. Nonetheless, I dove in and started testing out some PHP in my existing webpack environment. This proved to be the straw that broke the camel's back, where the camel is my poor Webpack environment. It occured to me that Webpack was not built for the pipeline I was forcing it to oversee at about the moment Webpack started constantly refreshing pages for no discernable reason. I had pushed Webpack beyond its intended purpose and I was being punished for it. It was absolutely defeating and it added another layer of complexity on the road to fixing my initial indexability issue: I had to set up a new build pipeline or go it raw.
Thankfully after a few search attempts I found something that looked promising: gulp.js. When first reading about it I honestly had no clue what it even did. The descriptions seemed extremely general and the code samples seemed of limited use, but I was desperate at this point and gave it a go. It turned out to be a great idea. Gulp.js is relatively low level as far as tooling goes, but has a very lively ecosystem of plugins to provide functionality. It's easier and better supported than integrating all the functionality into my own custom tooling, but also flexible enough that the build process won't produce the bloodcurdling screams of dying software! When all was said and done, I had a new pipeline that was more stable, flexible, and approachable than before. I even ended up integrating webpack into the gulp pipeline in a way that it was actually meant to be used: bundling!
Build pipeline re-jigged, I could finally tackle the issue I actually wanted to deal with. This was a process of many mistakes, lots of learning, and general confusion.
I started naively, trying to just PHP-ify small parts of the website and use the PHP strictly as a JS API. I pretty quickly realised that this method would cause potential issues with spiders as they had to recieve the proper view at load time, without JS interference. So, I modularised the site into many discrete PHP scripts, each handling and providing partial or full views ('pages', this is going to come up more now). index.php
fulfilled its role as an index more literally, parsing the requested URL and calling on the required scripts to fulfill it.
At the same time, the Javascript side of the site needed to be completely overhauled to transition from anchor based 'articles' to the url based views that exist now. This was probably the most confusing period of the entire project. I couldn't test anything as both the JS and the PHP had to be changed to work together, meaning I had two separate, unfinished, untested systems to try and work my head around.
This system swapped from serving the whole page (all content) on first load to providing only the specifically requested view. It took me a little while to realise that a ramification of this was that I needed to be able to serve and display the landing page as well as all the articles that were formerly handled by the anchors. This is what prompted a rewrite from 'showing/hiding articles' to 'switching views'. The views were more general, and as a result more extensible. The landing page was no longer considered special, but was a view of its own just lacking an IsArticle
flag.
I swapped links to URLs, intercepted their click events to stop page loads, and added URL rewriting rules to my server to facilitate communication between the client and server. Additionally, since the client no longer needed to communicate directly with the blog server and definitely not because I nearly tore my hair out trying to get a SignalR client in PHP to work, I dropped SignalR entirely.
The final features I had to reimplement were the blog and the projects page. For the projects page, inspired by the overhaul, I wanted something that didn't require manually inputting each project into the PHP. My first attempt was simply iterating the project files from PHP, but associating the project IDs (the-devils-cookbook, etc.) with the project paths proved impractical, as server initialisation and inter-request caching are relatively annoying tasks in PHP. Instead, I added a new HTTP API to the existing .Net blog server to provide file paths from project IDs. Unfortunately attempting to convert the projects to PHP resulted in a somewhat messy project format but it works... At least until I want to add a new project, but I'll cross that bridge when I get to it.
The blog followed afterwards, requiring a rewrite of the existing Blog API from SignalR to HTTP. Luckily, this part was relatively pain free - as was most of the backend changes required for the blog. I integrated a PHP markdown parser into the webserver (Parsedown) and moved on to the front end.
The Javascript proved more challenging. Part way through implementing the post views I realised that, in order to keep the code single-responsibility, I'd need to allow for custom handlers for views. As blog.js
managed finding and loading posts, the view system had no knowledge of them. The usual process breaks down in cases with dynamic content, as the primary views are registered on page load (for example, the blog page registers itsself to /blog/
) but dynamic content cannot. To solve this, I allowed for registration of views that override the standard process and may implement their own loading, and can return another view to the view system for display. This addition, along with the views rewrite as a whole, caused a few issues uncovered after asking a friend of mine to please break my website. Following that I was able to resolve them.
After testing, I found that despite being replaced as soon as possible in JS, the metadata from the original load was still used by embeds and spiders. The fix for this was relatively simple, if a little tedious. I moved things around in PHP to perform some inline replacement of metadata within the page header before serving the initial load. This meant that any fresh page load would return the correct metadata for the target URL.
With that, the overhaul was done(ish)! There were, and are, still some small improvements I want to make but for the most part I don't think I'll be doing any major changes to existing functionality on the site for a good while. I'll also be getting around to actually writing those TDC blog posts soon!