Troubleshoot Ad Manager crawler errors
Earn more revenue from your content with a
fully crawlable site. To make sure you've optimized your site for
crawling, consider all of the following issues that might affect your
site's crawlability.
Grant Google’s crawlers access in
robots.txt
To ensure we can crawl your sites, make sure you’ve given
access to Google’s crawlers.
If you’ve modified your site’s robots.txt file to disallow the Ad Manager crawler
from indexing your pages, then we are not able serve Google ads on these pages.
Update your robots.txt file to grant our crawler access to your
pages.
Remove the following two lines of text from your robots.txt file:
User-agent: Mediapartners-Google
Disallow: /
This change allows our crawler to index the content of your
site and provide you with Google ads.
Any changes you make to your robots.txt file may not be
reflected in our index until our crawlers attempt to visit your site again.
Provide access to any content behind a login
If you have content behind a login, ensure you’ve setup
a crawler
login.
If you have not provided our crawlers a login, then it’s
possible that our crawlers are being redirected to a login page, which could
result in a “No Content” policy violation. It's also possible that our crawlers
receive a 401 (Unauthorized) or 407 (Proxy Authentication Required) error, and
thus cannot crawl the content.
Page Not Found errors
If the URL sent to Google points to a page that doesn't
exist (or no longer exists) on a site, or results in a 404 error ("Not
Found"), Google's crawlers will not successfully crawl any content.
Overriding URLs
If you are overriding the page URL in ad tags, Google’s
crawlers may not be able to fetch the content of the page that is requesting an
ad, especially if the overwritten page URL is malformed.
Generally speaking, the page URL you send to Google in your
ad request should match the actual URL of the page you are monetizing, to
ensure the right contextual information is being acted on by Google.
Nameserver issues
If the nameservers for your domain or subdomain are not
properly directing our crawlers to your content, or have any restrictions on
where requests can come from, then our crawlers may not be able to find your
content.
Broken or duplicative redirects
If your site has redirects, there is a risk that our crawler
could have issues following through them. For example, if there are many
redirects, and intermediate redirects fail, or if important parameters such as
cookies get dropped during redirection, it could decrease the quality of
crawling.
Consider minimizing the use of redirects on pages with ad
code, and ensuring they are implemented properly.
Webhost issues
Sometimes when Google’s crawlers try to access site content,
the website’s servers are unable to respond in time. This can happen because
the servers are down, slow or get overloaded by requests.
We recommend that you ensure your site is hosted on a
reliable server or by a reliable service provider.
Geographical, network or IP restrictions
Some sites may put in place restrictions that limit the
geographies or IP ranges that can access their content, or having their content
behind restricted networks or IP ranges (for example, 127.0.0.1).
If these restrictions prevent Google’s crawlers from
reaching all your pages please consider removing these restrictions, or making
your content publicly accessible, to allow your URLs to be crawled.
Freshly published content
When you publish a new page, you may make ad requests before
Google’s crawlers have gotten a chance to crawl the content. For example, sites
that post lots of new content include news sites, sites with user generated
content, sites with large product inventories, weather sites, and more.
Usually after the ad request is made on a new URL, the
content will get crawled within a few minutes. However, during these initial
few minutes, because your content has not yet been crawled, you may experience
low ad volume.
Personalized pages with URL parameters or dynamically
generated URL paths
Some websites include extra parameters in their URLs that
indicate the user who is logged in (for example, a SessionID), or other
information that may be unique to each visit. When this happens, Google’s
crawlers may treat the URL as a new page, even if the content is the same. This
can result in a few minute lag time between the first ad request on the page
and when the page gets crawled, as well as an increase in the crawler load on
your servers.
If the content on a page does not change, we recommend
you remove the parameters from the URL and send that information
to your web server in another way.
A simpler URL structure helps make your site easily
crawlable.
POST data
If your site sends POST data along with urls (for example,
passing form data via a POST request), it's possible that your site is
rejecting requests that are not accompanied by POST data. Note that since
Google’s crawlers will not provide any POST data, such a setup would prevent
the crawlers from accessing your page.
If the page content is determined by the data the user
inputs to the form, consider using a GET request.