To gain a competitive advantage in Google, you must deal with two key factors to shape your search rankings - crawlability and indexability. Without testing for these factors, you'll find it extremely hard to outrank the competition.
So, you’ve built a great website, created amazing content, used a good SEO strategy and curated backlinks from blogger outreach. However, you're still finding it really hard to rank in Google. You may ask yourself...
- is my site crawlable?
- is my site indexable?
I'll teach you how to make your site more crawlable and more attractive to the search index. Start by running a crawl test below...
Test Website Crawlability
Run a quick crawlability test.
Find critical issues affecting your site’s indexability in Google.
How Does Google Crawl Websites?
Google has 3 core components:
The crawler is also known as a spider, Googlebot and user agent.
The crawler’s job is to discover new web content by following links. It follows each link, then follows the links on newly discovered web pages, and so on and so forth.
The crawler then brings new content back to Google’s database for cataloging, referred to as indexing.
What Does Crawlability Mean?
The ease of or ability for Google to crawl your website content, discover all site links and their destination pages without encountering dead-ends.
We’d like to assume that all your links lead to their destinations and your site is easily crawlable. But, if the spider has trouble crawling your site you’ll risk a low ranking.
What Does Indexability Mean?
The ability for Google to catalog your website content properly for relevant keyword search terms.
So, crawling of your site directly affects how the search engine indexes your pages.
How to Influence Crawlability and Indexability
Crawlability is influenced by the technical and structural aspects of your website environment.
Google may stop crawling if it discovers broken links, technical issues or an inefficient site layout.
The goal is to create an efficient site setting to influence the spiders’ ability to crawl your site.
Efficient crawling starts with good internal linking, site structure and no crawl errors.
Your website should be a well-connected network of web pages that link to one another in a logical structure.
Your web pages need to be reachable via a hyperlink, otherwise, you risk undiscoverable pages.
The crawler can’t index pages it doesn’t find.
As I mentioned, your site should be inter-connected with relevant page to page links. The crawler (or site visitor) should be able to access any page within 1-2 clicks (3 max). Pages nested too deep presents a poor site structure and a bad user experience.
To help Google understand your content better, the site structure must have link relationships around core topic pillar pages that link to related sub-topics commonly referred to this as content cluster.
Then to help Google even further you must have those sub-topics link back to the core topic showing content priority. See the content cluster example below (from our HubSpot Content Strategy Tool) showing the core topic linking to sub-topics.
Failure to create link relationships leads to undiscoverable pages and content the crawlers have difficulty indexing (classifying).
Fix Crawl Errors
If the search engine follows a link but cannot access the page while crawling, there won’t be anything new for indexing. Your web server may return any of the following HTTP errors: 500, 502, 503, 404, etc.
Crawl errors will show up in Google’s Search Console (previously Google Webmaster Tools). So, you’ll want to fix them right away.
Fix Broken Links
Broken links occur when moving or renaming web pages. Short of making a sitewide search and replace, you may create a dead-end link unintentionally. The crawler can’t access pages from a broken link. Generally, you'd see a 404 error - page not found.
Fix Redirect Loops
A redirect is a server rule where the server will send a user to page B when page A is requested.
More specifically, a redirect loop happens when multiple (conflicting) redirect rules. I.e. redirect rule #1 says Page A points to Page B and redirect rule #2 says Page B points to Page A.
Similar to a broken link, a redirect loop may occur when moving content or renaming page URLs. The crawler can’t access neither Page A nor Page B.
Forms and Scripts
Crawlers can’t access content restricted behind a form. You may have content accessed via a login form or gated content requiring a form submission before displaying.
Outdated technology or third-party scripts can restrict the crawler and prevent indexing.
How to Improve Crawling and Indexing
I’ve mentioned that linking, structure and errors will influence the crawling and indexing of your site. Now, let’s discuss how we can improve the efficiency of the crawl and create a healthy environment for indexing.
Crawler Access with Robots.txt
Robots.txt is a utility file living within your site and crawled by Google. It has special block/allow indexing instructions which helps Google's "crawl efficiency" and "indexing accuracy". Note, Meta Tags can also deliver per page instructions on a case-by-case basis.
Page Load Speed
Faster loading websites yield a better user experience and improves the bots crawling rate.
But, note that increased crawl rate doesn’t always mean better indexed search results. Google considers over 200 factors when determining your search engine rankings.
A sitemap lists important web pages of your site, while telling the search engines about your content. The sitemap also gives valuable metadata like last page update. A few ways to keep sitemaps organized and crawlable:
- Update XML sitemap regularly *
- Eliminate duplicate pages ***
- Redirect pages properly (when deleting or renaming) **
- Ensure canonicalized pages *
- Use consistent, search engine friendly URLs
- Use a UTF-8 encoded sitemap *
- Check for sitemap errors regularly ***
* Feature of Yoast SEO free plugin
** Feature of Yoast SEO premium plugin
*** Google Search Console
Nothing attracts search engines more than authoritative, high-quality content. But, not all content is created equal. Remember that it must meet the organic keyword litmus test and provide something of value to the consumer.
Prevent Duplicate Content
The same content found on multiple URLs of your website. It can occur on any site especially ecommerce stores and blogs. A few examples:
- mysite.com/nike-air-max/ and mysite.com/sneakers/nike-air-max/
- mysite.com/my-cool-blog-post/ and mysite.com/my-category/my-cool-blog-post/
Google doesn’t know your preferred URL which may impact crawling and indexing. It’s easily fixed with the rel=“canonical” Tag telling Google your primary URL.
Crawlability Testing and Index Monitoring Tools
Google SEO Tools
Use Google’s go-to list for SEO tools. Here are a few popular ones:
- PageSpeedInsights to analyze pages for speed and usability improvement suggestions.
- Mobile-Friendly Test is another great tool to show mobile performance.
- Google Search Console provides rich insights into your site's crawlability and indexing. A place to submit your XML sitemap, examine structured data and much more.
SEO Site Audit
Run a site audit regularly to maintain good SEO health. A site audit tool will give you a comprehensive overview of your site’s overall health allowing you to stay on top of problems.
SEMrush offers the most comprehensive set of tools and is our SEO secret weapon. The SEMrush Site Audit provides extensive data running 20 different checks that focus on the ability to crawl and index your site. We use this tool to run automatic weekly audits so we’re always optimizing.
Get yourself a good set of tools and test your website often.
Of course, no tool will make a bit of difference if you don’t follow through on its suggestions so, be diligent in your efforts. If you do, you’ll improve your website's crawlability and gain favor in the search engines.