August 2023 - Hreflang'd pages removed from duplicate pages report, new resource details view and new 'block ad scripts' list

Hreflang'd Pages Removed from Duplicate Pages Report

Release Date: August 10 2023
Historically we've reported pages as being duplicated despite being connected with reciprocal hreflang links. However it is very common for regional versions of pages in the same language to be identical, and most people would not consider this type of duplication to be an issue.
Soon we will exclude any pages connected with reciprocal hreflang links from the duplicate pages report. For example, if a website has two identical pages targeting US and UK audiences with reciprocal hreflang links:
  • domain.com/us/page (with hreflang link pointing to UK page
  • domain.com/uk/page (with hreflang link pointing to US page
As these two pages are hreflang reciprocated, they won't be reported as duplicates.
If your crawls include pages connected with hreflangs, you may see a reduction in the number of duplicate pages reported when this goes live.
Pages with hreflangs that are not reciprocated will still be reported as duplicates.

Resource Details View

Release Date: August 10 2023

A new view dedicated to showing metrics for crawled scripts, CSS and image resources has been released.

When you see a resource in reports (such as JavaScript Files or CSS Files) you will get a more focused set of metrics for the resource, as well as the ability to see every page which references the resource, and any pages which link to the resource with anchor tags.

Screenshot of Lumar Analyze, showing the Resource Details page for a CSS resource.

New 'Block Ad Scripts' List

Release Date: August 1 2023

We've upgraded the list of advertising script hostnames we block during rendering to The Block List Project, a public domain blocklist which is frequently updated.

The ‘Block ad scripts’ setting is enabled by default, and can be found in Advanced Settings > Spider Settings > JavaScript Rendering. You can disable it if you want to manually block specific scripts using the Custom rejections setting.

Screenshot of Advanced Settings in step 4 of the crawl setup, showing the options for JavaScript Rendering, with the block ad scripts option checked.

Incomplete data for crawls run on August 1st and August 2nd 2023

Due to a faulty deployment, all crawls which included any sources except Web (such as Sitemaps or Google Search Console) run between between 1st August 2023, 14:50 (UTC) and 2nd August, 2023 11:05 (UTC) may have incomplete data. Only the first 500 URLs will have been crawled from any source which is not the Web source during the time period mentioned above. The cause of this has been identified and fixed, and we are going to refund the credits for all these crawls.

We would advise that you delete these crawls if you believe they are missing data, and re-run them. Sorry for any inconvenience caused.