Add more sources to your crawl

Before starting a crawl, it is worth getting a better understanding of the value of adding each crawl source (step 2 of the crawl setup).

1. Web crawl: Crawl only the site by following its links to deeper levels.

Check the www/non-www and http/https configuration of the domain when you add the domain. If you are not sure about sub-domains, check the Lumar "Crawl Subdomains" option and they will automatically be discovered if they are linked.

2. Sitemaps: Crawl a set of sitemaps and the URLs in those sitemaps. Links on these pages will not be followed or crawled.

For a detailed guide on how to add an XML sitemap to your crawl please visit this page.

3. Analytics: Upload analytics source data and crawl the URLs to discover additional landing pages on your site which may not be linked. The analytics data will be available in various reports.

4. Backlinks: Upload backlink source data, and crawl the URLs, to discover additional URLs with backlinks on your site. The backlink data will be available in various reports.

Another addition to the Lumar platform is the integration with Majestic which automatically brings in backlink metrics for your URLs. Adding Majestic to a crawl is easily done in the second step of the crawl setup and you can even choose to import backlink data from their Fresh or Historic index.

Crawl sources1.png

Adding Majestic backlink metrics to a crawl will allow you to uncover and fix issues like orphaned pages with backlinks as well as backlinked pages which have become broken, disallowed or have started to redirect.

5. Google search console: Search Console Data

With Search Console data integrated, you can use Lumar's reporting to discover powerful insights by looking at the interaction between indexability and traffic such as Primary Pages in SERPs without Clicks, Primary Pages not in SERPs, and Error Pages in SERPs.

Crawl sources2.png

Crawl sources3.png

3. URL lists: Crawl a fixed list of URLs. Links on these pages will not be followed or crawled.

4. Log files: Upload log file summary data from log file analyser tools, such as Splunk and Logz.io. You can also manually upload a log file summary file free of charge.

Ideally, a website should be crawled in full (including every linked URL on the site). However, very large websites or sites with a complex architecture may not be able to be fully crawled immediately. It may be necessary to restrict the crawl to certain sections of the site or limit specific URL patterns by using the Include/Exclude only rules in the Advances settings (step 4 of the crawl setup).

Data Only

Enabling Data Only mode will only pull information about URLs which were found in other sources. URLs which are unique to this crawl source will not be added to the crawl.

Crawl sources4.png

Search

Welcome to the Lumar Knowledge Base

Add more sources to your crawl

Data Only