If you have an ecommerce website and work with a good developer, you probably already have a robots.txt file in your root directory. Search engines use robots (also known as spiders or crawlers) to search and categorize web sites, and the pages that they find during their crawl are the pages that show up in the search results when someone performs a search in Google, Bing or any other major search engine. Your robots.txt file is an effective way to tell search engines which areas of a website should not be processed or crawled.
To Crawl or Not to Crawl? That is the Question
Without a robots.txt file in place, your ecommerce website is completely open and crawl-able, which sounds like a good thing. But using up bandwidth on irrelevant and outdated content may come at the expense of crawling and indexing important and valuable pages. You might even have some key pages that get skipped entirely.
If left unchecked, a search engine robot may crawl your shopping cart links, wishlist links, admin login pages, your development site or test links, or other content that you might not want showing up in the search results. Crawls can find personal information, temporary files, admin pages, and other pages that contain information that you may not have realized was publicly accessible.
It is important to remember that each website has a “crawl budget”, a limited number of pages that can be included in a crawl. You want to make sure that your most important pages are getting indexed and that you’re not wasting your crawl with temporary files.
While a Robots.txt file can be useful to block content that you don’t want indexed, sometimes robots.txt files inadvertently block content that website owners do want crawled and indexed. If you're having troubles with some of your key pages not getting indexed, the robots.txt file is a good place to check to identify the issue. It's a good idea to monitor what's in your robots.txt file and keep it updated.
There are a number of SEO tools that you can use to see what a robots.txt file may be blocking and one of the best tools is found, for free, in Google Search Console.
With the Google Search Console Robots.txt Tester you can test specific pages to see if they are being crawled or not.
Don't forget to leverage your XML Sitemap to list the pages you do want Google to crawl. If you're auto-generating your XML Sitemap you might inadvertantly be including pages that you're also disallowing in your robots.txt file. Keeping an eye on Google Search Console will alert you to errors like these in both your sitemap file and your robots.txt file.
A robots.txt file can be the best thing to happen to your ecommerce website, so if you're not utilizing its power, it's time to start. If you're not sure where to start with your robots.txt file, we can help!