Thursday, 11 October 2018

What's been happening in 2018.

Firstly an apology. Due to other commitments, maintenance and updates have been slow across the board and with recent increases in demand the service has struggled without ongoing maintenance.

We are now making the first of a number of incremental updates to move our service to an improved and more scalable (and hopefully lower cost) approach using serverless computer.

We've also updated the website terms and privacy statement to better align with new EU data protection requirements although in general we do not process personal data other than your email address for the purposes of operating the service and with your express permission.

Online Sitemap Generator Update

  • New : Serverless computing code and infrastrcuture update (see below).
  • New : Updated processing rules and limits to reduce wasted resource (see below).
  • New : Updated EULA and privacy statement.
  • New : Updated captcha / human detection.
  • Fix : Tweaked error logging to remove redundant content.
  • Fix : Error and log links on sitemap download page.
  • Fix: Image parsing unprotected "critical section" error.

Serverless computing


Instead of having fixed virtual servers, serverless computing allows specific functions to be executed in the cloud as part of a pooled resource meaning they are only executing when required. this means we can reduce the number of virtual servers to those running the core servers and the spidering functions operate in the serverless space.

This should improve performance and in theory reduce costs as the compute is only being used and paid for when it is actually needed second by second.

New serverless spider architecture


Update processing rules

  • Maximum pages 2000 (Includes pages which errored)
  • Spider will run for maximum of 60 minutes.
  • Individual page timeout 20 seconds
  • Download limit of  60K (for larger files the first 60K is downloaded).
  • Maximum of 20% urls hitting 20s second timeout.
  • Average request time  across all urls must be 75% of timeout (which is 15 seconds) sampled every 25 urls.


What's to come?


Some of the updates to the spider will propagate to the next release of G-Mapper will increase the performance and stability of the software.

We also have a new update to the Wordpress plugin in the pipe which will likely be released towards the end of 2018 / early 2019.

Stay in touch and keep us updated with any issues.