Improved HTML spidering (Robots, Canonical, Rel)

We now now parse a number of HTML elements to better understand your website and which files should be in your sitemap.

Canonical urls

We now detect the link rel=“canonical” tag.

ย 

Where we detect this tag and it points to another page we will not include the current page in the sitemap and will instead spider the url specified in href attribute of the tag.

Meta robotsย 

We now obey the meta tag for robots.



Whereย  a noindex orย  nofollow value si detected we will not index or will stop following urls on the given page.

Anchor rel attribute

We not obey rel=“nofollow” in anchor tags

As with the meta robots tag if we detect a nofollow value we will not follow this url.