Where in the World does GoogleBot crawl your website from? Where is GoogleBot based and should you optimise your website for crawl efficiency from a specific Geo location or physical realm?
Here is Google’s list of confirmed crawling IP addresses and the locations where these IP’s originate so you can see where your website is getting crawled from when you see a specific IP address in your crawl logs.
Googlebot IP Address List
Here’s the list of IP addresses that Google uses to crawl your website from. Essentially, these are Googlebot’s IP addresses.
220.127.116.11/24: Google (United States) 18.104.22.168/24: Google (United States) 22.214.171.124/20: Google (United States) 126.96.36.199/20: Google (United States) 188.8.131.52/20: Google Cloud (United States) 184.108.40.206/19: Google Cloud (United States) 220.127.116.11/15: Google (United States) 18.104.22.168/16: Google (United States) 22.214.171.124/23: Google (United States) 126.96.36.199/24: Google (United States) 188.8.131.52/24: Google (United States) 184.108.40.206/21: Google (United States) 220.127.116.11/20: Google (United States) 18.104.22.168/19: Google (United States) 22.214.171.124/18: Google (United States) 126.96.36.199/17: Google (United States) 188.8.131.52/14: Google (United States) 184.108.40.206/13: Google (United States) 220.127.116.11/12: Google (United States) 18.104.22.168/11: Google (United States) 22.214.171.124/10: Google (United States) 126.96.36.199/10: Google (United States) 188.8.131.52/13: Google Cloud (United States) 184.108.40.206/14: Google Cloud (United States) 220.127.116.11/15: Google Cloud (United States) 18.104.22.168/16: Google Cloud (United States) 22.214.171.124/17: Google Cloud (United States) 126.96.36.199/18: Google Cloud (United States) 188.8.131.52/13: Google Cloud (United States) 184.108.40.206/12: Google Cloud (United States) 220.127.116.11/12: Google Cloud (United States) 18.104.22.168/13: Google Cloud (United States) 22.214.171.124/20: Google (United States) 126.96.36.199/19: Google (United States) 188.8.131.52/23: Google (United States) 184.108.40.206/20: Google (United States) 220.127.116.11/19: Google (United States) 18.104.22.168/19: Google (United States) 22.214.171.124/18: Google (United States) 126.96.36.199/16: Google (United States) 188.8.131.52/15: Google Cloud (United States)
|184.108.40.206/20||Google Cloud||United States|
|220.127.116.11/19||Google Cloud||United States|
|18.104.22.168/13||Google Cloud||United States|
|22.214.171.124/14||Google Cloud||United States|
|126.96.36.199/15||Google Cloud||United States|
|188.8.131.52/16||Google Cloud||United States|
|184.108.40.206/17||Google Cloud||United States|
|220.127.116.11/18||Google Cloud||United States|
|18.104.22.168/13||Google Cloud||United States|
|22.214.171.124/12||Google Cloud||United States|
|126.96.36.199/12||Google Cloud||United States|
|188.8.131.52/13||Google Cloud||United States|
|184.108.40.206/15||Google Cloud||United States|
The obvious emergent pattern is that all of the IP addresses that Google crawls from are geographically based in the USA. Given the company was founded in it’s parent company Alphabet is registered in the United States this is not surprising. So, what does the geo-location of Google’s data centres have to do with SEO?
Allow GoogleBot to Crawl
The first point almost seems too obvious to state but don’t block GoogleBot from crawling. Don’t configure bot blocking based on on IP addresses included in the list above and don’t block any bots unless you’re sure they’re imitating Googlebot, rather than the real thing.
Check things like:
- your server to make sure it is reachable
- your network to understand downtime or issues that could prevent crawling
- robots.txt rules, these can block pages, site sections or the entirety of a website from being properly crawled
Block Unwanted URLs from Crawling
Blocking crawling with robots.txt is an effective way to improve the crawl budget of your website.
If you have URLs that you don’t want indexed or would add nothing by being indexed, then you can write rules for your robots.txt file to keep them out of the index.
Note that using meta robots noindex rules won’t necessarily help your crawl budget if that’s the issue causing your site pages not to be indexed.
Consolidate Duplicate Content
Google’s wording here isn’t related to using canonical tags, pages with proper canonical tags, those that reference a separate URL as the canonical version of the page, still take crawl budget.
If you’re looking to optimise crawl budget on your website look for issues of identical duplication, issues like where your site loads on both http://www.yourdomain.com/some-path/ and yourdomain.com/some-path/ both with a self referential canonical tag.
Other similar examples of issues like this can include:
- URLs loading both with and without trailing slashes, both with self referential canonical tags; e.g. yourdomain.com/some-path/ and yourdomain.com/some-path
- canonical tag errors, where a space, additional character or self generating URL parameter can create multiple versions of an indexable page with the same content.
Find and Fix your Sitemap.xml Files
Find the sitemap.xml for your website and ensure that URLs listed within the sitemap aren’t non-indexable URLs or not needed in the index.
By removing and optimising your sitemap.xml files, you can ensure that Googlebot and search engine spiders prioritise crawling of your most important, indexable URLs.
Remember to ping your sitemap to Google if you make changes to multiple pages, site hierarchy or URL structures, so Google can quickly discover the changes to your content.
Use a CDN
Even if you’re running a website based in another geo, like the U.K. solely for London, you’ll still benefit from hosting your site via a CDN (content delivery network).
A content delivery network uploads the code running your site to data centres around the World meaning the site is faster to load for both users visiting from anywhere in the World and crucially, search engines who, in the case of Google, only normally crawl from the United States.
The recommendation of moving your hosting to data centres based in the US is likely overkill and at worst, potentially detrimental to your SEO efforts because it is often argued that IP location of your hosting server is important for how well you rank for queries in a particular Geo.
Unless your business website is truly appealing to a Global audience, you’re best off hosting your site on a hosting platform with hosting servers close to where your organisation is based.
Improve Server Response Times
The faster your server response times, the more pages Googlebot may be able to crawl across your website.
Achieve this by paying more for better hosting, finding and fixing redirect chains that can slow crawls, blocking large resources on pages that add little or no context to the content of the pages.
Use Http Status Codes to Specify Changes
Googlebot may use If-Modified-Since to send a conditional request. If your page and content has been updated since a certain date, the content may be crawled.
If it has, a 200 status response will be returned, if not, a 304 status code will be returned.
This potential utility of If-Modified-Since underlines the critical importance of frequently updating content across all indexable pages of your website.
If you’re struggling with pages not being crawled frequently or worse, not being indexed, ensure you’ve updated the content recently and that your pages are fresh and indexable.
Including the last modified date in your meta tags is important here.
If you’re not intending to , or your pages don’t need updating, then you needn’t worry about returning 304s.
Improve Page Speed
Improving the page speed of indexable pages across your website is a critical factor for helping improve crawl budget.
Google’s resources are limited, so there’s not an infinite resource standing by to index newly published pages on your website.
Improving page speed can help Googlebot collect more data, more quickly and efficiently, when it is crawling.
Page speed performance improvements can help your users to enjoy using your website more too!
Here are even more ranking factors.
High quality and useful content is what Google is seeking out. If your content meets these criteria and helps human users to achieve their goals, then your site could be crawled more frequently.
One way to prove how useful your content is, would be to gain authoritative backlinks from high quality websites within your niche.
With a lot of links pointing to various different pages across your website, you are more likely to see an uptick in crawl frequency because;
- Googlebot considers your site as higher quality because of the high number of links
- Googlebot will stumble across your website more frequently because of the inbound links
It is difficult to prove that proximity to Google’s data centres and crawling locations makes a difference because the benefits would likely be so negligible.
However, because Google begins crawling from the US, then US sites could be something to consider for your link building efforts.
US sites are generally higher authority, have larger visit counts and are generally more profitable than sites elsewhere. So even if the crawl frequency increases because of proximity of US sites to Google Data centres doesn’t provide measurable impacts, the higher quality of some US sites could.
If you have any kind of ambitions to grow your website internationally then it could be argued that links from US sites could help to accelerate this growth because your site will be more likely to get crawled via an external link from a US site.