I'm sure most here have heard about Google Search Console. But did you know Google reports back how they fared during crawling and indexing your site, in great detail?
All this information is made accessible through the Index Coverage Report.
The report helps you identify which pages have been successfully indexed and/or if your directives towards search engines work well.
To evaluate the performance of the pages, the Index Report can result in four different statuses — valid, valid with warnings, excluded and error.
The Valid URL status reflects a page that has been successfully indexed. The pages can be either indexed and submitted in the XML sitemap, or index without being submitted. If the latter occurs, you should check if the URL should be indexed and if so add them to your XML sitemap. If not, make sure you implement the robots noindex directive.
Optionally, you can exclude the URLs in your robots.txt if they can cause crawl budget issues.
The Valid with warnings status means that the pages that have been indexed have some issues that need checking. The status contains one type only — “Indexed, though blocked by robots.txt”.
This means that Google has indexed these URLs, despite being blocked by your robots.txt file.
Under normal circumstances, Google wouldn’t have indexed these URLs. But for some reason, it found links to these URLs and thus went ahead and indexed them anyway.
In this case, you should review these URLs, update your robots.txt, and possibly apply robots noindex directives.
Status Excluded means that search engines picked up a clear signal that the pages shouldn’t be indexed. It contains these 15 different types:
- Alternative page with proper canonical tag
- Blocked by page removal tool
- Blocked by robots.txt
- Blocked due to unauthorized request (401)
- Crawl anomaly
- Crawled - currently not indexed
- Discovered - currently not indexed
- Duplicate without user-selected canonical
- Duplicate, Google chose different canonical than user
- Duplicate, submitted URL not selected as canonical
- Excluded by ‘noindex’ tag
- Not found (404)
- Page removed because of legal complaint
- Page with redirect
- Soft 404
The Error status means that pages couldn’t be indexed for some reason, and contains the following eight types:
- Redirect error
- Server error (5xx)
- Submitted URL blocked by robots.txt
- Submitted URL marked ‘noindex’
- Submitted URL seems to be a Soft 404
- Submitted URL returns unauthorized request (401)
- Submitted URL has crawl issue
- Submitted URL not found (404)
Keeping an eye on each of these statuses and their types will allow you to zoom in on specific issues Google has found on your site, and thus improve the overall performance.
The descriptions in Google Search Console are notoriously cryptic, so I wrote a massive guide on how what they mean, and what action to take.
If this starter has piqued your interested, read more about How to Find and Fix Index Coverage Errors in Google Search Console.