To ensure crawls complete quickly and efficiently, you can enable Crawl Safeguard in the Advanced Settings in Step 4 of the project settings (or on the crawl progress screen). Currently, this is not enabled by default, but may be in the future. For instructions on how to enable this feature, see below.
Crawl Progress Screen
When Crawl Safeguard are enabled, on the crawl progress screen, you'll see:
- A summary of the count of URLs per status code in the last 5 minutes.
- A trend of the failure rate (status codes 0, 429 or 5xx - see the full list below) per 5 minutes for the last hour, with the overload percentage.
Failure Rate Threshold
You can also set a threshold for the percentage of errors within a fixed timeframe, which will pause the crawl when the rate is exceeded. You'll see a message on the crawl progress screen to explain that the crawl is paused due to errors, and will also receive an email so you can investigate and disable the overload protection if required.
Failure Status Codes
The following status codes are flagged as failures:
- Timeout (0)
- 403
- 405
- 429
- 500
- 501
- 502
- 503
- 504
- 505
- 506
- 507
- 508
- 509
- 511
How to Enable Crawl Safeguard
You can enable Crawl Safeguard in the Advanced Settings in step 4 of the project setup. In step 4, click on Advanced Settings to open the options.
Scroll down to the Spider Settings section, and at the bottom of the list you'll see Crawl Safeguard which you can click on to open up. Simply check the box to enable Crawl Safeguard, and set your failure rate in the box below.
You can also enable Crawl Safeguard on the crawl progress screen if a crawl is already underway. You'll see the option on the left-hand side of the scree, with a toggle to turn on and set your failure rate.