Saturday, 15 February 2014

Improved support for character encoding and redirects

Character encoding

We've improved our spider so that it can cope with a wider range of character sets including Arabic and Chinese.

Don't forget that for this feature to work correctly it is important that we can understand your website encoding otherwise our spider won't interpret it correctly and your sitemap will contain strange characters and symbols.

http://www.w3schools.com/tags/ref_charactersets.asp

Improved HTTP 301 redirect and 302 Moved handling

Not only do we now follow HTTP 301 and HTTP 302 automatically. As well as this when the spider first hits your home page if we detect a 301 or 302  we will work out what the base domain for your website is.

Coupled with the improved patter match (see below), we can now determine your primary domain and match more variations to come up with a complete xml sitemap with canonical urls.

Improved pattern matching

If you enter a single domain we will now check for more patterns :

http://www.xmlsitemapgenerator.org
https://www.xmlsitemapgenerator.org 
http://xmlsitemapgenerator.org

https://xmlsitemapgenerator.org

When we create your sitemap we will use the version that you entered in the settings so make sure this is the correct primary domain for your website.

Fixed https validation

You can now enter an https:// address without getting an invalid URL error.

We also now try to handle badly constructed urls better such as :
xmlsitemapgenerator.org///directory/test.html