We take a look at what to do if your development/staging/testing site has been indexed in Google, including expert advice direct from a Google employee.
First of all, don't beat yourself up: a development site getting accidentally indexed in Google has happened at least once to every agency, developer, and inhouse team on the planet.
Perhaps you found out about it via an angry email from your client or their SEO consultant, or maybe you discovered it yourself whilst checking for indexed pages.
If Google is able to find its way to your development site unimpeded, and finds there no instruction not to crawl or index the URLs available to it, there is a high chance Google will store the pages in their search engine for users to find.
This can happen reasonably quickly - even if your dev site was only available for a few days or weeks - this could be enough time for Google to index the entire site.
So if your development site does get indexed in Google, what can you do about it, and are there any urgent solutions for situations where for example the client or management is upset?
As recommended by Google's John Mu, if you find your staging site has been indexed and there is an urgent requirement to remove it, the quickest way to remove content from Google is to use the official 'Removals Tool' found in Search Console.
I'd do a site-removal request in search console - if the site is verified, it'll be hidden in search within less than a day. After that, you have time to figure out what to do for the long run.
John Mu via TechSEO subreddit
Official video: Removals in Search Console - Daniel Waisberg
To use the Removals tool you will first need to verify the specific domain you want to remove in Search Console if it is already verified.
John goes on to offer some footnotes and warnings regarding use of the tool:
- If you make a mistake and need to cancel a removal request, this process should be fast.
- Remember that removals apply to both www and non-www, and both http/https.
- Using the tool properly should clear the URL from Google for around 6 months.
After temporarily removing URLs from Google, it is sensible to then work towards a permanent removal.
The most effective way to request Google no longer indexes a page is to either use a noindex command, or ensure the resource responds with a 410/404 HTTP response to indicate it is no longer available.
Google have stated in the past that a noindex tag and 404/410 should work at the same speed.
If Google returns to a resource following a temporary removal request and finds a 404/410 or noindex tag they will cancel the removal request as it is no longer needed.
You could also set up authentication which would result in Google being unable to access a resource (eg with a 401 HTTP response).
Using a robots.txt block is not a good solution if your site has already been indexed. It can take a long time to have any impact and is not a direct instruction to remove content from the index, so Google can ignore it and leave the page indexed if they wish.
If your site is already indexed in Google, using a robots.txt rule to prevent Google crawling the site will also prevent them from seeing a noindex tag/header if you add one to a page.
Using the Removals Tool in Search Console will by default remove the URLs entirely, including the cache.
When using the tool you are given the option to remove the cached URL - which will clear the snippet shown in search results - until the resource is recrawled and a new snippet will be shown.
To prevent your dev site getting indexed in Google there are a variety of methods you can use:
- Authentication (password, IP address, CMS/plugin based, etc)
- Noindex tag or header
- Robots.txt disallow rule (least recommended option) Google's John Mu recommends the use of server side authentication as the best method:
My recommendation is always to use server-side authentication for staging / dev sites, since it's obvious when it's blocked, and obvious when it's forgotten. Robots.txt and robots meta tags are easy to accidentally deploy to your live site.
Note: Robots.txt is not a good option because it can be ignored by Google and other search engines.
You can use any of the standard methods to stop Google indexing a WordPress staging site - eg password protection, noindex or blocking Google from crawling the site with robots.txt.
The easiest method if you have access to the WordPress admin dashboard is to set WordPress to enable the 'Discourage search engines' option via Settings > Reading. This method should add a noindex tag to all your pages.