How to fix failed website crawls

SYMPTOM - CRAWL RESULTS	POTENTIAL REASON	SOLUTION
0 or 1 disallowed URL is returned	Robots.txt disallowed everything	You can use a custom robots.txt file by utilizing our "Robots Overwrite" feature found in Advanced Settings > Test Settings > Robots Overwrite.
1 indexable URL only returning 200 status code	Whole site or at least part of the website uses JavaScript	You will need to purchase our JavaScript rendering add-on and run a rendered crawl for accurate results.
"Include only" rules/paths isn’t linked from the primary domain.	Add in a start URL that contains links to the pages that satisfy the include only rule. You can do this by going to Advanced Settings > Scope > Start URLs.
If there is a login portal, the site might require cookies to be crawled.	Contact support@lumar.io with the sessions cookie and they will add it to your project.
1 URL crawled with status code of 401 or 403	Site is blocking Lumar via IP	You can make sure the default IP (52.5.118.182 or 52.86.188.211) is selected in Advanced Settings > Spider Settings > Crawler IP Settings and then have the webmaster of the site whitelist these IP address.
1 URL returned with 3xx status code	Primary domain redirects to a URL that isn't in the scope of the crawl	Ensure the redirected to URL is within the scope of the crawl by either selecting Crawl both HTTP/HTTPS, Crawl all subdomains or add in specific secondary domain. Adding the redirected to URL as a start URL can also fix this.
1 URL crawled with curl_GotNothing or with no links or metrics at all	Website has security features that block fake crawlers that are pretending to be real crawlers	If the page failed, changing the user agent in Advanced Settings from Googlebot to DeepCrawl will resolve the issue.
1 Failed URL with reason: error_Curl_Err_SSLCertificateError or error_Curl_Err_CurlError	The SSL certificate for the website might be invalid. Common issue when crawling staging sites.	Can check validity in address bar or on external validator for more information. Tick "Ignore invalid SSL Certificate" in Advanced Settings > Scope > Crawl Restrictions.
All or most URLs return 403 status code (the page title might be "Attention Required! \| Cloudflare").	Website has security features that block fake crawlers that are pretending to be real crawlers	If the page was blocked, changing the user agent in Advanced Settings from Googlebot to DeepCrawl will resolve the issue. You can also test this by changing your browser user agent to Googlebot.

Search

Welcome to the Lumar Knowledge Base