GSC aka. Google Search Console (previously known as GWT aka. Google Webmasters Tools), isn’t exactly a digital marketing tool. Having said that when it is installed on your website returns plenty of data to empower your digital marketing efforts. It’s very unfortunate to see that most of the Digital Marketer using GSC only to understand the organic performance of their websites. However, the primary role of GSC is to diagnose the issues with the website and report the same to webmaster. Today, I have picked one of the most important section in GSC i.e. “CRAWL ERRORS” which either ignored or just been marked as fixed by most of the marketers without doing a deep dive.
What is Crawl Error?
Crawling is process used by Google to collect data using Bots (Crawlers / Spiders). When Googlebot faces trouble in collecting the information from any page, it returns an error status in GSC which is known as Crawl Error.
Types of Crawl Errors
Crawl errors are primary classified into two types based on device type i.e. Desktop and Smartphones. For each device type crawl errors further classifiec into four types based upon the nature of issue faced by Googlebot during the crawl process.
If any web server is returning 5xx related response code to Googlebot for any page then that page will be listed under server error by GSC. There could be multiple reasons which may cause such errors. In some cases, if backend URLs which aren’t accessible publicly but crawler may have found the link or maybe sever is facing the bandwidth issue and crashing more frequently or sever is located at some distant geography and connection establishment from server is the challenge. Such errors may effect indexing of your website which may cause fall in traffic. In case of such scenario, escalate this issue to your IT team on topmost priority.
404 Not Found
404 is a response code shared by your web server to Googlebot in case of the page is no longer available on the website which Googlebot is trying to access. I won’t consider it a Crawl Error because if the page is not available on website then web server should ideally return 404. This is still kept in under the crawl errors by GSC as an indicative to webmaster of the respective website. GSC bring these URLs to webmaster’s notice because if the URLs went down accidentally then make those live again but if it has been done intentionally then remove all the signs of that URL from the website including broken links.
This is an error mostly caused by webmasters due to lack of understanding. As explained above, if the page is not available on website then web server should ideally return 404. However, it’s disappointing to see thata very least digital marketers are able to develop that understanding. If a page is no longer part of your website but your web server returns any other response code (such as 200, 301, 302 etc.) but 404 then it will cause soft 404. Letting it happen on your website is a severe offence to Google and it may invite some algorithmic penalty. Few most common scenarios which may cause this are as follows:
- Redirecting non-existent pages to closely related pages.
- Inappropriate implementation of custom 404 page.
- In last few months, I have observed that if a page is existing on the website but apart from standard section rest of the page is blank then GSC will also treat such pages as soft 404 cases.
Any other reason apart from three majorly listed types which obstructs crawling will fall under this category. So, it is comparatively much broader category and can’t be concluded on major reasons why URLs are listed in this category and standard treatment for such issues. In this scenario, you have to work on very dynamic approach based on various cases encountered.
Google Search Console is a very powerful tool to diagnose, manage and maintain the health & hygiene of the website. Crawl errors under GSC are symptoms of catastrophic issues, if not dealt on time and in a specific manner then it may lead you to penalisation by Google.