The Technical SEO Audit Guide For Launching A Website
Every second, 54 500 Google searches are made around the world. That’s 3 million+ searches every minute!
This is a huge opportunity for any website owner looking to grow their website traffic and with a little bit of work you can increase your share of this.
Auditing your new website’s SEO before the big launch is the key to making your company visible on search engine result pages (SERPs).
There are many small yet misunderstood SEO configurations that you can make to your website pre-launch to make sure it’s SEO-proof and ready to show up in SERPs.
In this guide, we’ll look at how to audit the most important technical SEO aspects of your website. This will give your new website the best start in life and hopefully get it appearing in Google results more quickly. If you want to read more about audit tools available then you can read our independent guide to the top 9 SEO audit tools.
1. Set up Robots.txt
Don’t feel put off by the slightly technically sounding “robots.txt”. It’s just a text file that sits on your web server and is actually pretty straightforward. Robots.txt is a tool that search engines created to let Webmasters choose whether we’d like our web pages to be crawled, and discovered in search results or not.
The Robots Exclusion Protocol (REP), or robots.txt is just a text file webmasters create to instruct search engine robots which pages to crawl or not on their website.
The file always sits at the root of your website i.e. http://www.yourdomain.com/robots.txt.
If you want your website to show up in SERPs, and have a simple website without lots of pages then it’s best to keep all the pages crawlable. 99% of the time this is a fine solution for websites.
Sometimes we see new websites launched with the following in their robots.txt file:
User-agent: * Disallow: /
This blocks search engine crawlers from crawling the website. It usually happens when web developers forget to remove it from the development website before putting it live. Your new website’s SEO will never get off the ground if you don’t get this right.
In case you’d like to block search engines from crawling parts of your site, use this cheat sheet by Moz to add the right instructions to your website’s robots.txt.
Remember that any robots.txt file is a publicly available file. This means that anyone can see the blocked sections of a server. Therefore try to avoid including login pages or any other sensitive parts of a website which could be compromised. If you’re looking to create pages that aren’t publicly searchable, use password protection to keep visitors from viewing confidential pages you’d rather not to be indexed.
The Robots.txt Effect
When you block URLs from being crawled in Google via robots.txt, they may still show those pages as “URL only” listings in their search results. Like in the example below for amazon.com which includes the robots.txt line:
But in Google SERPs when you search for a page including this URL, it says:
“A description for this result is not available because of this site’s robots.txt”
This is one of the most misunderstood features of Robots.txt. Including a URL in robots.txt does not remove it from Google’s index if the URL is linked to on the web or included in a sitemap.
To add a robots.txt file to your website, include it in the top-level directory of your web server (i.e. http://www.yourdomain.com/robots.txt). Usually, it’s the same place where you put your web site’s main “index.html” welcome page. Remember to use all lowercase for the filename: “robots.txt” instead of “Robots.TXT.
You can only have one robots.txt file per website.
Checking Your Robots.txt
Use a robots.txt analyzer to track your website and see the impact of your robots.txt files on the crawling of your website:
If you’re in need of more information, see this guide by Moz.
You can also use Google Search Console to test which URLs are blocked (or not) by your robots.txt file. See later in the article for more info on Google Search Console.
2. Detect & Resolve Duplicate Content Issues
Before the big launch, audit your website for any duplicate content and use the Canonical URL tag. A canonical tag instructs search engines to index one page where you have a group of pages that are very similar.
Having multiple web pages that contain duplicate content can weaken your SEO in many ways:
- Search engines don’t know which version to show in search results.
- Search engines are unsure which version to include or exclude from their index.
- Search engines can’t decide whether to share link equity (link juice, authority, etc.) between duplicate versions or assign it to one particular page
Unless you take some action to resolve duplicate content issues, search engines will assign a lower relevancy score for your website to search queries. You might lose a very large proportion of your Organic search traffic as a result.
Use the Canonical URL tag to tell the search engines which version of the duplicate content is the original and the most important. Mark up the canonical page and the duplicate pages with a rel=”canonical” link element.
The Canonical URL tag is part of the HTML header of a web page. It looks like this:
<link rel="canonical" href="https://yourwebsite.com/" />
Note the inclusion of the full URL and domain (including http:// or https://). Canonical URLs must not use a relative path.
Using this tag will tell the search engines that the page should be treated as a copy of https://yourwebsite.com/ and that all the links and relevance metrics should be assigned to the original URL.
Here are common types of duplicate content that you should check for on your website:
- Discussion forums that can generate both regular and stripped-down pages targeted at mobile devices
- E-Commerce Store items shown or linked via multiple distinct URLs (e.g. same product, different size or colour)
- Printer-only versions of web pages
- Capitalized URLs – if a URL has some capitals in it and you can also render the same page at a version that does not have the capitals in it, you are going to have duplicate content
- Session IDs – This occurs when each user that visits a website is assigned a different session ID that is stored in the URL
- On WordPress Blogs using lots of Categories or Tags but without a lot of posts e.g http://mywebsite/blog/category/seo might return the same HTML as http://mywebsite/blog/tag/marketing.
How do I find duplicate content?
The answer is that it’s not always that easy to find duplicate content and if it’s not reported in Google Search Console (see later in this article) then you need to do some detective work.
You can use a free web crawler such as Siteliner to discover any broken or duplicate URLs on your website.
You can also do a “site:” query on Google to see the list of all the indexed pages on your website and look to see if they are similar. Here’s a screenshot of what happens when you do a site query for this website:
This how the list of duplicate content will look like on Siteliner for your website:
Using free online tools to find and fix your duplicate URLs is a cheap and easy way to increase your search engine visibility and attract more organic traffic to your website after launch.
There are also expert, paid web crawler tools such as ScreamingFrog and DeepCrawl for detecting duplicate content issues on larger sites. The results of these would usually form part of a large scale digital marketing audit.
3. Create an XML Sitemap
Let’s start with a simple question: What is a sitemap?
Here’s the definition by Google:
A sitemap is a file where you can list the web pages of your site to tell Google and other search engines about the organization of your site content. Search engine web crawlers like Googlebot read this file to more intelligently crawl your site.
So basically a Sitemap is a list of URLs on your website you create to help search engines better understand your website’s content and structure.
Why do you need a XML Sitemap?
In some cases, if your content and web pages are all properly linked and easily crawlable, having a sitemap isn’t going to make much difference. However, having a sitemap can improve your website’s search engine rankings as it helps Google and other search engines to crawl your site more quickly.
Here are some types of website that benefit from sitemaps:
- You have a large website – Google might overlook some of your most recently updated pages as they sit quite deep in the hierarchy of your site.
- Your website has a large archive of content pages that are poorly linked to each other – search engines might not pick up all of these pages.
- Your site is new and doesn’t have many external links pointing to it – Google might be unable to discover parts of your website in the first place
Did you catch the last one? GoogleBots might not be able to discover your site if it is new and has few external links to it. Here’s a picture of GoogleBot to brighten up your day:
So if you’re doing a pre-launch SEO audit, make sure to add “XML Sitemap” to your checklist and keep GoogleBot happy.
Here’s one more benefit of sitemaps. You can also use a sitemap to indicate preferred URLs for the same content. All you need to do is to pick the canonical (preferred) URL for each of your pages, and tell Google about your preference by submitting these canonical URLs in a sitemap.
Remember that using a sitemap doesn’t guarantee that all the items in your sitemap will be crawled and indexed by Google. But you’ve done all you can to make your website easily found.
Top Tips for Handling XML Sitemaps
- Create your XML sitemap before and submit it to search engines after you’ve published your website.
- You can set “Priority” and “Last Modified” fields on each URL in a sitemap to give further instructions to search engine crawlers.
- Don’t forget to include a link to your sitemap from your website’s robots.txt file. It should specify the location of your XML Sitemap and tell the search engines if you have any directories that you do not want to be indexed.
Submit Your Sitemap To Google
You can submit your sitemap directly to Google via Google Search Console.
Note that in order to submit your sitemap to Google, you must first verify your domain with Google Search Console. Once you verify your site, Google identifies you as the site owner.
- Select your site on your Google Search Console home page
- Click Crawl
- Click Sitemaps
- Click ADD/TEST SITEMAP
- Type sitemap.xml (or the path to the location of your website’s sitemap)
- Click Submit Sitemap
Expert XML Sitemap Advice for Larger Websites
Large websites can split their sitemap into multiple files using a Sitemap Index. This makes it easier to resolve indexing problems in Google Search Console. You should always submit your image sitemap separately to your web page sitemap.
4. Set up Google Search Console (aka Google Webmaster Tools)
Google Search Console (GSC) is the primary mechanism for Google to communicate with Webmasters regarding the organic health and performance of websites. When combined with Google Analytics, you can get a good picture of your website’s SEO health.
Google Search Analytics helps you monitor and maintain your site’s presence in Google Search results, whilst other tools help you diagnose issues, configure crawling parameters and lots of other useful stuff.
To set up Google Search Console, you first need to verify your website ownership.
To verify a property (website):
Click Add A Property on the Search Console home page and submit your URL
Choose one of the verification methods and follow the instructions. Not all verification methods are available for all properties; the verification page will list which methods are available and recommended for your site.
After you’ve verified your website, you can log in and start to examine the data for your site.
Google Webmaster Tools also provides some additional settings to have your website crawled according to your preferences. For example, you can set the crawl speed and choose how often you’d like to have your website crawled by GoogleBot.
Search Console is also the perfect tool for checking SEO errors such as broken URLs or duplicate content. Google will inform you of current mistakes and you’ll be able to fix these quickly.
For a complete overview of what you can do with Google Search Console, see this comprehensive guide by Search Engine Watch.
By now, you should have a clear overview of the most critical website pre-launch SEO tactics. If any of these topics seemed a little overwhelming don’t worry – there are many great resources that explore each subject even further. You simply don’t need to be an SEO expert to do a basic SEO audit of your website. In fact you can read one of our technical seo audit case studies here which will help you apply some of the skills you’ve learnt.
Having your website SEO-proofed before it launches is the key to being discoverable in search engine results and getting lots of organic traffic.
Share your pre-launch SEO experiences with us! Make sure to leave a comment about the best practices and your on site favourite tips that have helped you make websites highly visible in SERPs.