Below is more information on how Lumar calculates metrics and reports.
What are metrics in Lumar?
A metric is a piece of information about a page, link, or sitemap that we have extracted from a URL or has been calculated in our system (e.g. DeepRank).
Here are some examples of metrics we store around a URL:
- Title tag
- URL
- Meta robots tag
- HTTP Header
- DeepRank
- Clicks
- Impressions
There are different levels of metrics which we have to calculate within the Lumar system.
For example, Meta Noindex is a low level true or false metric which lets you know whether a page has the noindex meta tag. Indexable is a high-level metric which needs to take into account several metrics to be accurate (such as noindex tags, headers, canonicalisation, etc.). All these different metrics, once calculated, let our system identify if a page is indexable or non-indexable.
For all pages fetched and processed in our system, we collect more than 300 metrics which include everything from a page's title to the number of Search Console impressions.
What are reports in Lumar?
A report in Lumar is a combination of different metrics - while a metric is an individual piece of information about a page, a report takes many metrics and their values into account.
For example, the Page Title metric is the title that we extracted from your page, but the Short Titles report is a list of URLs which have a short title and are indexable.
Examples of reports in Lumar:
- Noindex pages: Pages which have a meta robots or X-robots noindex.
- Canonicalized pages: Pages whose canonical tag is not self-referencing.
- Primary pages: Indexable pages which are unique or the primary of a set of duplicates.
What are Lumar's datasources?
During our crawls, we collect information about URLs, links between those URLs, and sitemaps. As these three pieces of data are so different from each other, we separate them into separate main databases.
Pages and URLs
This datasource contains each URL and all metrics related to each URL. For example:
- Indexable pages
- Non-200 pages
- 301 Redirects
Links
This datasource contains each link and related metrics, for example:
- Source URL
- Target URL
- DeepRank
- Orphaned pages
It also contains links which have issues. For example broken links, links between protocols, and a few other cases.
We do not currently store every single link and its source that we see during a crawl as this is typically terabytes of data. If you are interested in all links between pages, look at Unique Links.
Unique Links
This datasource contains every unique link that we saw during the crawl. For example:
- Anchor text
- Target page data
- Primary sources
- Nofollow
If your website has a navigation link to the homepage on every page of the website, then we will save that link once along with a count of the times we saw that link.
Sitemaps
This datasource includes Information about the sitemaps we processed during the crawl. For example:
- Broken/disallow sitemaps
- URL count in sitemaps
- Sitemap type
Protocols / Scheme Blacklist
Links to URLs with the following protocols (e.g. https://) are ignored by Lumar completely.
Additional protocols can be added to the crawler by creating a development ticket.
- fb-messenger
- tel
- callto
- ms-windows-store
- javascript
- mailto
- app
- data
- afs
- cid
- file
- ftp
- mid
- skype
- chrome
- sms
- geo
- fax
- faxto
- freeze
- blob (serpent only)