Key sitemap generator changes

Saturday, 9 May 2020

We've been making a number of changes to our online sitemap generator. Some of these will filter down to G-Mapper over the coming weeks. It's important to understand these recent changes as they impact how we spider your website.

Hosting updates

We recently completed a migration to new hosting within Microsoft Azure to simplify management and deployment of services. We hope that this will reduce down time and deployment errors as well as allow us to scale more easily. 

Most of the major changes are now complete and so any minor updates should cause minimal disruption

Canonical urls

Due to issues with how various sites have implemented canonical urls we will no longer automatically redirect to alternate domains as this is causing problems due to misuse / poor implementations on some sites.

Instead we will process the page within the context of the current domain and issue warnings about canonical urls in the logs by default.

If you set the option to obey canonical urls we will assume you are spidering the correct domain and when we see a canonical url:

  • If it matches the current page we will include the current page.
  • If it is within the current domain we will ignore the current page and content and add the canonical url to the list to be spidered.
  • If it is outside of the current domain we will ignore the page and content.



Robots meta and rel nofollow / no index 

Some sites end up with no pages or missing pages due to the use of the robots meta tag and rel nofollow attribute which has created negative feedback.

By default we will now ignore these and issue warnings in the logs. If you wish for our spider to obey any robots meta or rel attributes please used the advanced settings.


 

Diagnostic tool 

The diagnostic tool has been updated to give a more realistic view of how our spider sees your page. When you use this tool it will now output a verbose log of how it processed your page. These details will also be included in the log which we include in your sitemap download file.

When a spider session returns no results or only one page the log details from homepage will be displayed on the results page. This can often be due to the website blocking our spider.



Other changes

We've changed our user agent header to imitate common browsers as some websites we're issuing a HTTP 403 not authorised error. If your site still blocks us with a 403 we will try to detect this and notify you.

Gzip and deflate compression were not working and causing excessive bandwidth use. We have now resolved this which will improve performance.

This has enabled us to increased the max page size to 300Kb uncompressed as some sites were not finding urls due to their size. This will be notified as a warning as this is exceptionally large for a page.

If your site uses javascript to detect bots and blocks our spider we will try to detect this and notify you. Specifically we're noticing a number of sites use the "testcookie nginx" module. We are working on a better solution to address this but it required further investigation.

We've also been making some general improvements to our spider code as well as some bug fixes to try and make it more reliable and robust, this will hopefully benefit G-Mapper users once it rolls out to our Windows solution.

Support us


Don't forget we have now updated our service so that you can contribute and get some benefits. Please consider supporting us to help keep the project alive.

Support us.


Become a contributor and get extra benefits

Tuesday, 28 April 2020

We provide all our sitemap services for free and for the love of it, but running this service on advertising revenue and donations is difficult as they are unpredictable and in recent times they have not covered our costs.

In May 2018 new European Union Data Protection regulation had a significant impact and with recent world events we now find ourselves under significant financial pressure.

As a result we have brought forward plans to better manage our capacity and introduce our patron scheme which will allow people to contribute to the service and get extra benefits.

A basic free service will be maintained so that over 80% of our users will be able to continue with little or no impact while heavier users will be able to become a contributor.

Patron contributor tiers

Initially we will have 5 tiers of service which will suit the majority of our users. If you require  additional services, please contact us.

All our contributors will benefit from having

  • No google adverts, 
  • No captcha human check to simplify the user experience and 

Each usage tier will allow for more pages, more websites and a greater number of spider sessions in a 28 day period as follows :


  • Free - up to 250 pages for up to 1 website
  • Bronze - up to 500 pages for up to 1 website
  • Silver - up to 1000 pages for up to 5 websites
  • Gold - up to 2000 pages for up to 10 websites
  • Platinum - up to 5000 pages for up to 20 websites

Payments can be made using Paypal which includes card transactions.
We will continue to review these tiers over the coming months.

For more details about the tiers please visit our contributor page.

More about our costs

We hope that by introducing this new model we'll be able to cover our basic costs include things like:

  • Domain names and SSL certificates.
  • Web server and database hosting
  • Bandwidth and storage costs
  • Development and testing tools
  • Software licences
  • Email services
  • Marketing

We need 1,000's of supporters just to cover these, before we consider time and effort invested in the service, so we really appreciate any support.

Our aims

Of course we want to do more to improve the service and by becoming a contribute you can help.

  • Ensuring the existing service is financially sustainable.
  • Support upgrading our hosting infrastructure.
  • Adding resilience and redundancy.
  • Invest in development and updates to the service.

We hope that these changes will ensure the future of the service and thank you for your continued support.

Upcoming service changes

Wednesday, 8 April 2020

Over the last few months we have bee looking at the sustainability of our various sitemap solutions.

Since the introduction of new EU Data Protection legislation in 2018 our ability to generate revenue through advertising has become limited.With dwindling advertising revenue from current streams and limited donations we are increasingly finding ourselves reaching in to our own pockets to fund the service.

We have been looking at options to allow the continued running and maintenance of the services as well as ways to invest in a number of key areas such as infrastructure, developer tooling, support materials as well as new services.

Having experimented with a number of models we believe that providing a patron model for a percentage of our community would be the best way forward, where we can generate more regular  reliable income and offer some basic benefits as a gesture of goodwill.

With recent world events surrounding the pandemic, the need to address these challenges is ever more pressing as we find ourselves under increased financial pressure. In order to address these issues we will be accelerating the implementation of these new services as well as implementing some cost cutting measures.

  • Reducing the number of pages that our spider will include in a sitemap as standard to reduce our infrastructure overhead costs.
  • Tweaking our fair use policy to ensure we can continue to serve the community at large.
  • Limiting the account features / saving of sitemaps
  • Introducing our patron scheme which will allow contributors to support the service and also get access to more features / resources.

The final details of what this entails are still being worked on, but from our initial modelling the good news is that for over 80% of our users the service will (to most intents and purposes) be unaffected.

For the remaining users, we have identified that they are more frequent and higher volume users of the service, however the patron model will allow them to go over these limits for a nominal contribution in line with our current donation model.

Further details and final arrangements will be published over the coming days, but we felt it was important to give you some notice of these incoming changes.

We thank you for using the service and your continued support us and we wish you, your colleagues and family all the best during this difficult time.

What has been happening?

Thursday, 19 March 2020

First of all we're sorry we've been quiet for a while. The service is very much run on a volunteer basis and some sometimes other priorities take over.

We rely on the support of our community and advertising revenue which of late has decreased due to stricter rules on the use of personal data. We thank you for your continued support.

What's new?


We've updates our WordPress plugin having tested it for the most recent version of wordpress as well as publish a new version 1.3.5 which we would welcome input from our community.

1.3.5 should be considered test and not for use in a production setting.




Online sitemap generator

We've made a number of improvements to our service to improve compliance with data protection legislation. Including :

  • Privacy policy updates.
  • Consent and use of cookies.
  • Automatic download of your data.
  • Right to erase your data

Some people use our service heavily and seemingly for commercial purposes. To help moderate use and make it fair as well as keep our cost sunder control we have implemented a fair use policy and will limit the number of times the generator can be used in a given period. We've been testing and tuning this over the last few months.

Most people wont hit this limit but if you do simply wait a few days and you'll be able to use the service again. We're hoping to introduce a premium service shortly for users with greater needs.

We've also fixed a couple of bugs. In particular the bug which included external links being wrongfully included in the sitemap when they were part of a redirect.

G-mapper

G-mapper receives the same bug fixes as out online version and we've also updated some of libraries and code to more recent versions. We've also addressed some of the color coding issues and user interface performance issues for larger sitemaps.

What's coming?


Over 2020 we're hoping to offer improvements to both our online sitemap generator and G-Mapper, including closer integration and more advanced features and options. We'll also officially release version 1.3.5 of our WordPress plugin once this has been tested further.

Don't forget to follow us on social media to stay up to date with news and updates.

G-Mapper updates

Tuesday, 9 July 2019

Sorry we've been quiet for a while. As a free service run on volunteers and donations it isn't always easy to keep on top of what is essentially another business!

We're pleased to say recently we have been making a number of  updates to the service and hope to bring you more over the coming months.

We recently updated G-Mapper to bring it in line with the online service as well as address a number of bugs including random crashing which has been an annoyance for a number of users.

Get the latest G-Mapper version.

New: Update to latest spider engine.
New: Update to make home screen more useful. 
Fix: Update screen freezes / crashes while spidering larger site.
Fix: Fixed timeout / size limit issue.
Fix: Fixed crashing while spidering due to update screen.
Fix: Grid does not resize when window resizes.
Fix: Improved date validation on editor.
Fix: Cannot reset modified date to null.
Fix: Updated out of date 3rd party dependencies.

We are sorry that we cannot support everyone directly and respond to all requests for help, but we are very grateful for your feedback. We do log issues and try to address them with each release so please continue to let us know.


What's been happening in 2018.

Thursday, 11 October 2018

Firstly an apology. Due to other commitments, maintenance and updates have been slow across the board and with recent increases in demand the service has struggled without ongoing maintenance.

We are now making the first of a number of incremental updates to move our service to an improved and more scalable (and hopefully lower cost) approach using serverless computer.

We've also updated the website terms and privacy statement to better align with new EU data protection requirements although in general we do not process personal data other than your email address for the purposes of operating the service and with your express permission.

Online Sitemap Generator Update

  • New : Serverless computing code and infrastrcuture update (see below).
  • New : Updated processing rules and limits to reduce wasted resource (see below).
  • New : Updated EULA and privacy statement.
  • New : Updated captcha / human detection.
  • Fix : Tweaked error logging to remove redundant content.
  • Fix : Error and log links on sitemap download page.
  • Fix: Image parsing unprotected "critical section" error.

Serverless computing


Instead of having fixed virtual servers, serverless computing allows specific functions to be executed in the cloud as part of a pooled resource meaning they are only executing when required. this means we can reduce the number of virtual servers to those running the core servers and the spidering functions operate in the serverless space.

This should improve performance and in theory reduce costs as the compute is only being used and paid for when it is actually needed second by second.

New serverless spider architecture


Update processing rules

  • Maximum pages 2000 (Includes pages which errored)
  • Spider will run for maximum of 60 minutes.
  • Individual page timeout 20 seconds
  • Download limit of  60K (for larger files the first 60K is downloaded).
  • Maximum of 20% urls hitting 20s second timeout.
  • Average request time  across all urls must be 75% of timeout (which is 15 seconds) sampled every 25 urls.


What's to come?


Some of the updates to the spider will propagate to the next release of G-Mapper will increase the performance and stability of the software.

We also have a new update to the Wordpress plugin in the pipe which will likely be released towards the end of 2018 / early 2019.

Stay in touch and keep us updated with any issues.

Latest updates

Thursday, 17 August 2017


Online Sitemap Generator


New : verbose logging output.
New : Increased the page size we will parse.
Fix : Last modified date not captured from server.
Fix: Page excluded when child page errors..

G-Mapper

Includes the online sitemap generator fixes as well as:
Fix : crashing on load when version check cannot reach internet

Known issues : Random crashes.
May sometimes be caused by "Faulting module name: clr.dll" error in Microsoft .NET Framework 4-based applications. A solution may be to update your version of .Net 4.
Download latest .Net 4.x