Lumar's Advanced Settings allow you to customize the crawl to meet the exact parameters you need. In this article, we'll give you an overview of the settings available. In step 4 of the crawl setup process you'll see an 'Advanced Settings' button. Click this to open up the options available. As you enable different settings
Scope
Domain Scope
This setting allows you to set the primary domain, whether sub-domains and both HTTP and HTTPS will be crawled, and any secondary domains that will be crawled. These may have been set in steps 1 and 2 of the crawl setup, but can be changed here if needed.
URL Scope
Here, you can choose to include only specific URL paths or exclude specific URL paths.
It also allows you to create page groups. Add a name and a regular expression in the 'Page URL Match' column, and select the percentage of URLs that you would like to crawl. URLs matching the designated path are counted. When the limits have been reached, all further matching URLs go into the 'Page Group Restrictions' report and are not crawled.
Resource Restrictions
Here you can define which types of URLs you want Lumar to crawl (e.g. non-HTML, internal and/or external CSS or JS resources, images, etc.). You can also set Lumar to ignore an invalid SSL certificate.
Link Restrictions
This setting allows you to define which links you want Lumar to crawl (e.g. follow anchor links, pagination links, etc.).
Redirect Settings
Here you can choose whether to follow internal or external redirects.
Link Validation
Here you can choose which links are crawled to see if they are responding correctly.
Spider Settings
Start URLs
By default, the crawl will start from your primary domain, but you can set it to start from a different point, or multiple points, here. This would have been set in Step 2 of the crawl setup process, but can be accessed and changed here.
JavaScript Rendering
Here you can enable or disable JavaScript rendering. You can also add any custom rejections, any additional custom JavaScript, and any external JavaScript resources.
Crawler IP Settings
Here you can select regional IPs if required. If your crawl is blocked, or you need to crawl behind a firewall (e.g. a staging environment), you will need to ask your web team to whitelist 52.5.118.182 and 52.86.188.211.
User Agent
This is where you can set the user agent for the crawl, and change the viewport dimensions if required. This would have been set in step 1, but you can change this here if needed.
Mobile Site
If your website has a separate mobile site, you can enter settings here to help Lumar use a mobile user-agent when crawling the mobile URLs.
Robots Overwrite
This allows you to identify additional URLs that can be excluded using a custom robots.txt file - allowing you to test the impact of pushing a new file to a live environment. You can also select to ignore robots.txt for navigation requests and/or for resources. As mentioned above, site speed crawls are set to ignore robots.txt by default. If required, you can change these settings here.
Stealth Mode Crawl
This allows you to run a crawl as if a set of real users were performing it. It runs at 1 URL every 3 seconds, and the user-agent and IP address is randomized for each request.
Custom Request Header
This is where you can add any custom request headers that will be sent with every request.
Cookies
This setting is mostly used for accessibility crawls, to ensure any cookie popup is cleared so the crawl can progress. This is not generally required for tech SEO or site speed crawls, but you can see how to configure cookie details here if you need to use it.
Extraction
Custom Extraction
Here you can use regular expressions to extract custom information from pages when they are crawled.
Test Settings
Test Site Domain
This setting allows you to enter your test environment domain to allow comparisons with your live site.
Custom DNS
This allows custom DNS entries to be configured if your website does not have public DNS records (e.g. a staging environment).
Authentication
If you need to include authentication credentials in all requests using basic authentication, you can enter the details here.
Remove URL Parameters
If you have excluded any parameters from search engine crawls with URL parameter tools like Google Search Console, enter these here.
URL Rewriting
Here you can add a regular expression to match a URL and add an output expression.
Report Setup
Save HTML & Screenshots
Lumar enables you to save the static and rendered HTML and screenshots during the crawl. Find out about more about storing HTML and screenshots.
API Callback
This is where you can specify a URL to be called once your crawl has been completed to trigger an external application.
Crawl Email Alerts
Set whether to receive email notifications on the progress of your crawl, and specify the email addresses that will receive notifications.
Report Settings
This last advanced setting allows you to change some of the specific parameters for Lumar reports.