SPA’s – single page applications – are awesome, especially for mobile users. They enable us to give web users the feel and fluidity of using a native app, with the ease of updates and deployment of a website. They even scale better: since much of the content and template compositing happens on the client-side, your server has to do less work.
They are the best of all worlds. Except for SEO.
Principles of Good SEO for SPAs
Set the URL using pushState
This part of the HTML5 History API has been supported by all major browsers for over five years now, so you aren’t risking much in the way of compatibility problems.
Parse correctly any URLs you generate by pushState
This is an easy one to forget. Half the reason of rewriting the URL is for sharing and bookmarking. Make sure your SPA can fully reconstruct a page when given the URL. If you don’t, all you have done is created broken links. Your future self will not thank you.
Don’t block resources with robots.txt
Don’t block your scripts and style sheets. While this is mostly an issue for existing sites that are adding SPA features, check your robots directives just in case. Don’t forget your .htaccess files.
Don’t use long-running scripts
Google will time-out scripts that run too long, resulting in an incomplete page rendering and missing content. How long is “too long”? Google doesn’t explicitly say; they just warn against scripts that are “too complex or arcane”. A good rule of thumb is that if, as a site user, you ever notice a slight delay in an action, the script is taking too long. Optimize it.
Verify your site works using Google Search Console’s “Fetch as Google” tool
One of the hidden gems of Search Console is the ability to show you page as Google sees it. As part of your QA process, use Fetch as Google to make sure your page looks correct and is content-complete. This will help you spot pages whose scripts are too slow or complex for GoogleBot to process.
Use a sitemap.xml to provide a URL list of all content pages on site
Be careful not to create duplicate pages
This can happen unexpectedly when you’re building a front-end to a large database of content. For example, let’s say I create a streaming music website (because that’s never been done, right?). Consider these URLs:
Both are ways of showing the same album, but with different URLs. Which one is canonical? How would a bot know?
- Test in Chrome – If it works in Chrome, it’s reasonable to expect GoogleBot can index it. You should verify this using Google Search Console’s ‘Fetch as Google tool’.
- Avoid using hash (#) in URLs – Google doesn’t like to spider them, even when generated in scripts, and they can create what appear to be duplicate pages
- Don’t use hashbang URLs – Google has deprecatedAJAX Crawling
- Don’t block JS or CSS resources in robots.txt or other directives
- Performance matters – Profile your code. Make sure it doesn’t lag
- Don’t create duplicate pages
- GoogleBot does not support Service Workers – so it can’t spider a PWA
What about Bing?
Ah, Bing. The search engine nobody admits to using, but still accounts for 15% of North American desktop searches, and 3% of mobile searches. (Yahoo is back-ended by Bing, so I added them together to get these numbers.)