If you’re not entirely familiar with what regex actually is and does, it stands for regular expression and is a means of creating a search pattern. You use a regex string in order to match and locate certain sections of your website’s code. Regex can basically be used as a much more sophisticated Ctrl+F (or Cmd+F) function!
As well as being used for extraction, regex also has many other use cases within Lumar such as matching URLs for inclusion and exclusion within crawls. Google Analytics also supports regex matches, which is a very powerful way to filter reports within the tool.
There are many different regular expression languages, but at Lumar we use Ruby.
The search patterns you create can be teamed up with SEO tools that present the snippets of code you find in a meaningful, digestible way. If you create a custom extraction query using regex within Lumar to search for product prices that are coded into your site, for example, then you’ll get a handy list of pages with their prices listed for you.
RegEx Guides
Here you can find a refresher course on using regex. Understanding and writing regex can be quite daunting at first, but once you’ve got the basics of the different characters and their functions then you’ll get to grips with it in no time. Take a look at this guide on the basics of regular expression to get started.
The key to using regex correctly is establishing patterns that appear within your source code. Every website is coded differently, so there will never be a ‘one size fits all’ guide to custom extraction. The only universal strings are the ones that will always appear in the same way on any website, such as the Google Analytics code tag.
The best way to identify the patterns of code that are unique to your site is to get some examples of each of the templates, ‘Inspect’ element or open ‘View Page Source’ and dive in!
Rubular is a great tool for trying and testing the regex strings you create off the back of the findings from your template analysis. This tool also supports Ruby testing so it’s perfect for trialing anything you want to include in within Lumar.
Why is Custom Extraction so Useful?
One of the best ways of making an SEO tool, particularly a website crawler, work for you is through utilising custom extraction. This feature allows you to get right to the heart of your website and scrape the most important information from the HTML that will be most useful. Custom extraction sifts through the mass of code on your site and returns the data you want in a neat, orderly manner. When you know how to use this feature, Lumar becomes far more flexible and produces more granular data.
We already provide a high level of detail on site health and performance in our reports, but if you’d like to go beyond our reports and uncover even more detail about your site, then a line of regex may be all you need. Custom extraction is something that needs to be tailored to the requirements of each site so we can’t provide a blanket service that suits everyone for this, but working out the strings of regex that work best for your site will be well worth it.
To add Custom extractions to your crawl, go to the Advanced Settings (step 4 of the crawl setup) and scroll down to Extraction.