Monday, 30 May 2016

Website diagnostic tool

We sometimes get contacted by people with questions about why their sitemap wasn't generated as expected , indeed in some cases returning very few or no pages.

In most cases the answers are to do with the structure or format of their website and how it responds to our spider.

To help users understand and address these problems we've started to automate some standard checks we should normally do manually to help resolve these issues.

The diagnostic tool runs through a series of tests designed to mimic the behaviour of our spider.

We test things like
  • Accessing your server
  • The server response codes
  • Parsing of the HTML content
  • Important tags such as
    • titles and headers
    • canonical urls and
    • http refresh
  • Number of urls found
The tool will return a list of results which can be helpful when trying to understand how our sitemap generator interprets your websites.

Exampe diagnostics output

 You can access the diagnostic tool when you download your sitemap and from the help section.

In this release we also fixed a couple of bugs.
  • Fix: processing of http equivelent refresh meta tag.
  • Fix : noindex / nofollow rules to ensure pages that we're not indexed we're still followed where there wasn't a nofollow value.

Monday, 16 May 2016

Sitemap Infrastructure Upgrade

On  the 16th April our hosting provider 123-reg made a catastrophic blunder deleting 100's of customer virtual servers. They have since been unable to recover the servers and many reported permanent loss of data. You can read more about the outage and how it unfolded on our previous blog and on the BBC news website.

Having been through a rocky period we are pleased to say that we are now coming out the other side of the disaster having relocated our services to Microsoft Azure.

Our previous setup


Our previous setup with 123-reg gave us limited options in terms of resilience and scalability. We essentially hosted the service on a number of manually managed Virtual Private servers using rudimentary backup provision provided by 123-reg.

A simplified view of our 123reg hosting setup

Our  setup with 123reg setup was in a single UK data centre with basic hosting facilities, giving rise to a number of challenges including:

  • Difficult abstract and manage key services
  • No guaranteed fault domains
  • Problematic load balancing
  • Very limited backup options
  • Labour intensive manual scale
  • Lack of performance and security management

Our Microsoft Azure setup


Our new Azure setup has allowed us to easily abstract the various components and services and host them in much more resilient environments, with automatic redundancy/failover and much more opportunity to scale.

Azure allows us to host services around the globe at the click of a button. In the first instance we have provisioned our primary services in the Western Europe Azure region which provides a good central operating base.


An overview of our Azure setup

We now have a dedicated load balancer fronting the service allowing us to better manage traffic across a number of server nodes and deal with maintenance and updates much more transparently.

The servers themselves sit in separate fault domains meaning they do not depend on the same infrastructure should a fault occur.

We're also able to take server snapshots and scale out servers up and out should we need to meet peaks and troughs in demand. This can even be done automatically in real time!

Azure has enabled us to easily abstracted our file and database services to dedicated managed backends with inherent resilience, fault domains and automated backups.

Azure also affords us an awesome set of tools including application insights, performance and security monitoring enabling us to proactively manage the server.

What's next


As the service beds in we hope to tune and optimise the service further so there may be a few further bumps along the way, but ultimately we hope to provide a more stable platform.

With Azure being a flexible global platform our ability to scale and deliver compute power where it is needed to reduce network latency becomes a real possibility. We hope to be able to improve and expand the service further in the future as budget and resources permit.

We appreciate all the support and offers of help we received during this problematic period and would encourage you if possible to please contribute and help use keep sitemaps free. 

Wednesday, 4 May 2016

A few new updates and fixes

We've recently been making some improvements to our online service to fix a number of issues and improve performance. We're also working hard to add additional resilience and fault handling to reduce the impact of problems that sometimes occur.

Online sitemap generator


As well as recovering the service, we also had some updates in the pipeline that we have made available as part of this re-launch :

UPDATED : Email reminder service to send emails more intelligently.
  • When download complete.
  • If user hasn't downloaded a reminder after a few days before we delete it.
  • To ask for rating if downloaded.
FIX : Sitemap still deletes even if cancel clicked.
FIX : Modified and Images dropdown not loading correctly.

General sitemap / spider changes


General spider changes apply to both the online sitemap generator and g-mapper. An updated version of G-Mapper can be downloaded here.


NEW : xsi:schemaLocation and xmlns:xsi schemas added to the XML sitemap file

FIX : Various code / performance enhancements
  • Improvements to how our spider makes web request.
  • Improved performance / resource utilisation by replacing Smart Thread Pool (STP) with Microsoft Task Parallel Library (TPL)
  • Updates to image processing code / removal of legacy processing rules.
FIX : Images not indexing
FIX : Case of sitemap.htm file
FIX : Error report HTML tag.