These are all of the methods that you can use to block Google, Googlebot and other search engines if you wish, from accessing your website.
This is anti-SEO and will have severe impacts on your traffic levels. You should only use these methods if you are entirely sure you want to stop your website from getting indexed.
There are legitimate reasons for wanting to do this including; if you’re in the research and development phase, working on a redesign, in the event of a hack or if you’re creating something that you’d prefer search engines didn’t see just yet.
Disallow Googlebot in robots.txt file
You can specify which crawler you want to block from accessing your website by declaring a user agent with the declaration;
blocks only google
You can also block all crawlers and all user-agents by configuring your robots.txt file like;
Disallow all search engines in robots.txt file
blocks all user-agents and crawlers from accessing the website.
You can specify which user-agents you want to block, here are a few of the most popular user-agents roaming the web;
- bingbot; Bing’s web crawler
- AdsBot-Google; Google’s ads crawler
- Twitterbot; Twitter’s crawler
- AhrefsBot; Ahrefs crawler
There are many more user agents and crawlers out there that you may want to consider including as part of robots.txt rules.
Each user agent can be customised to match partial parts of the site including subfolders, URL parameters and resources.
meta googlebot “noindex” on every page
Use the meta robots rule meta name=”googlebot” and add the following rule to every page on your website to prevent Googlebot from accessing your content.
Adding this rule will block only google from accessing your website.
meta robots “noindex” on every page
Similar to the meta googlebot rule, replace googlebot with robots and you can block all search engines from indexing content from your pages.
Block all search engines from adding content from your pages to the index.
X-Robots-Tag for Non HTML elements
If you have pdf files, videos and images that you’d like removed from Google’s index then you can use the X-Robots-Tag to stop search engines from indexing your resources.
It works in the same way as meta robots rules with search engine spiders visiting a page, crawling the code within the head and respecting the rule found.
The example configured above will block all search engines.
Temporarily Remove Your Website with Google Search Console
You can temporarily remove specific pages, sections or your entire website from Google’s index using the Google search console URL removals tool.
To find the tool, visit search console, open the property you wish to remove and click “removals” on the left hand menu under the “Index” heading.
Press “New Request” it’s a big red button to open the URL overlay.
You can choose to remove specific URLs or all URLs within the prefix.
With the temporarily remove URL tab open you can submit specific pages to be removed by toggling “remove this URL only” or “remove all URLs with this prefix” that will remove all pages in the subfolder.
Press next to trigger the removal and follow the steps to completion (this is as far as I dared to go for the purposes of this demo).
This will remove all entered pages from Google for around 6 months, however googlebot will still crawl in that time.
Cached URLs are also removed.
You can block the entire site by leaving the path blank and entering your domain in “URLs that start with the prefix”.
After 6 months your pages should start to return to the index.
You can also select to “clear cached URLs” via the other tab.
If you have a page that has been cached by Google which you’ve updated to change content or remove an offer you may wish to remove the old cached page.
This can be done through the removal tool.
The URL path selection works in the same way.
Bing has a similar tool that works in a virtually identical way
Using .htaccess files you can create a password protected area on your server, this will trigger http authentication.
http authentication works because it responds to requests with a 401 response code or 407 response until the correct credentials are entered.
This stops search engines accessing your content because search engine bots cannot pass authentication requests or fill forms.
Basic setup of http authentication is possible through .htaccess or with Nginx
Http authentication blocks all search engines.
Build your website with react – lol jokes but it will slow Google down