Please help - कृपया मदद - Por favor ayuda - 请大家帮帮忙

Friday, 14 November 2014

Did we get that right?!

We're in the process of planning our multilingual sitemap generator and need help from our users who are multilingual.

Initially we're thinking about focusing on Spanish, Mandarin and Hindi but would welcome your input on what other languages we should focus on.

The first phase will be to translate the homepage which includes the sitemap configuration form, the waiting page and the download page.We've compiled the key words and phrases here translation form.

If you would like to get involved and volunteer please contact us and let us know what languages you can help out with.



Sitemap generators compared

Monday, 13 October 2014

We occasionally scout the web to see what else is out there.  Here is a quick side by side comparison of the top 5 online sitemap generators on google :

  1. XmlSitemapGenerator.org
  2.  XML-Sitemaps.com
  3.  web-site-map.com
  4.  freesitemapgenerator.com
  5.  sitemapdoc.com



Name 1 2 3 4 5
Ease of use +++ ++ ++ + +
Configurability +++ ++ ++ + +
Must register No No No Yes No
Free usage 2000 pages 500 pages Variable 5000 500
Can save settings Yes No No No No
Image sitemaps Yes No No No Yes
XML, RSS & HTML Yes Yes No No Yes
Filtering rules Yes No No No No
Link checker Yes No Yes No No



User feedback and reviews

Saturday, 13 September 2014

Your feedback positive and negative is really helpful and it's what helps keep us focused and improve the service. We thought we would share some of our favorite user comments :

"It is easy to use no hassle unlike other online generator, it exactly direct to the point. No tedious forms to register" - https://coyatz.com/wp

"I tried a couple of other sitemap generators but they were both rejected by Google. Yours was accepted. Thank you for your service." http://www.jira.co.nz

"Good and complete sitemap. It is a plus for the images in it!!" - http://www.steigerhout-meubelbouw.nl

"Really excellent. You identified bad links I was not aware of, and that is very helpful. Originally the Coffee Cup program I used for sitemaps generated XML, but now it's only HTML. I will be using you for more of our websites. Thanks for this service."  - http://www.clubsanook.com

"Very helpful tool for quickly getting a sitemap up. Thanks! Multiple formats appreciated." - http://www.deck-gates.com

"Wonderful blog site. Very rich in contents. A wide range of topics are covered. Wish to see more articles from them." - http://www.hnncommunications.com

"Nice en clean, works perfectly with google webmaster tools" - http://praktijkdriessen.nl

"This one works . Very Nice a site map generator online that actually works !! I love it !!!!" - http://nicepantyhose.com

"Quick, thorough and reliable. I have used this service for years and have it bookmarked on all my computers." - http://www.saloncabochon.com

"XmlSiteMapGenerator.org is easy to use - it helps you learn what sitemaps are, what options are allowed, and also generates other useful (and educational as well). Clean pages, not too Busy (I hate BUSY pages that are hard to get through)." - http://www.dpsfiles.com

"Excellent service, I'm using this service on a regular basis.   I definitely would recommend xmlsitemapgenerator.org to anyone!  Good job Guys/Gals for the good work. :)" - http://uslowwebsite.com

We plan to make all our ratings and feedback available online shortly.

Minor update to the 2000 page limit

Thursday, 11 September 2014

A short time ago we increased  the sitemap page limit from 1000 to 2000. This limit applies to the number of pages in a sitemap.

In larger websites we sometimes find they have many errors in some cases 10,000s of links with errors e.g. a 404 not found.. In these cases our spider was often crawling 10'000s of pages before it hit the 2000 sitemap page limit.

Whilst these are not too common, when we do encounter them  they can cause a log jam during busy times which blocks other sitemaps from processing.

To address this we have now implemented a change so the the maximum pages spidered is 2000. therefore if you have 10,000s of url when we have processed 2000 pages, even if they errored the spider will stop.

For the majority of sites this will make little difference, we just thought we would let you know.

Current performance issues

Wednesday, 3 September 2014

The last few months have seen a 40% rise in traffic and subsequent sitemap generation requests. This is great, but it also means that the service is heavily loaded and performing more slowly at peak times.

We're currently looking at potential improvements that can be made to the application and hardware upgrades, and will hopefully have more news soon.

As I'm sure you'll appreciate providing a service like this can't be paid for by our passion alone. In the more immediate term we will be adding a donate button to the website.

We hope this will raise some funding to help purchase the performance tools and hardware needed to improve the service. If everyone who used the service paid just £1 ($1.65 or 1.25 ) it would help keep the show on the road. We don't expect everyone will contribute as it is a free service, but if you find it useful and are able to contribute every little helps.

In the mean time please accept our apologies for any delays and and please bear please bear with us.

Update - 6th September 2014

We've made some minor hardware upgrades so we have a few more CPU cores to play with. This should help ease some of the congestion in the short term while we work on other improvements.




New gTLD support and URL limit increased

Monday, 1 September 2014

We've updated the domain validation to allow for new gTLDs (generic Top Level Domains) which are longer than the more traditional ones.So whether you are a  .com or a .photography you can use the sitemap generator.

As well as this we have increased the sitemap url limit from 1000 to 2000 to cater for larger websites.

HTTPS/SSL Google SEO and XML Sitemaps

Friday, 8 August 2014

Google recently announced that websites with SSL/HTTPS will gain you a minor ranking boost in an effort to improve online security. This gain is likely to be small to start off with to give webmasters time to adapt their servers and websites, but over time this boost is likely to increase making SSL a key part of your SEO strategy.

What do you need to do?


A key part of updating your website is ensuring you have an effective transition plan. This will include notifying search engines and directories of your new website structure. Sitemaps should be an element of this SEO strategy:

  • Get an SSL certificate with a 2048-bit key
  • Apply this to your server
  • Update your link structure - hopefully you are already using relative URLS.
  • Implement HTTP 301 Moved Permanently to redirect traffic to your secure site.
  • If you can't do this you may need a canonical meta tag
  • Update your Google XML Sitemap using your new HTTPs address 
  • Update Google and other search engines with your new sitemap.

 

Did you know?


You can access XML Sitemap Generator using HTTPS? In light of the news from Google we will be updating to make this default over the next few days.

HTTP / SSL costs less than you think ...

 

We use 123-reg for hosting, SSL and domains.A basic SSL for 2 years costs only £21.58 (about 36.20 USD or 27 Euros) making it an SEO no-brainer!

SSL Certificates 
 

Wordpress Sitemap Generator

Wednesday, 6 August 2014

If you need a sitemap for wordpress try our new (July 2015) WordPress Sitemap Generator Plugin. It will give you much more control over your sitemap and integrates directly in to WordPress.

Key features :

  • Supports HTML, RSS and Google XML Sitemap formats.
  • List all your WordPress Pages, Posts, Archives, Authors, Categories and Tags.
  • Includes paged links for posts for Archives, Authors, Categories and Tags.
  • Set global sitemap default values for priorty and update frequencies.
  • Set sitemap value for priority and frequency at the category, tag, post and page level.
  • Automatic daily auto ping when you have updated pages / posts in WordPress.
  • Add latest pages / posts RSS feed to page header.
  • Updates your Robots.txt file with sitemap entries.

Of course you can still use our regualr  Xml Sitemap Generator with WordPress if it offers you a quicker and simpler alternative :

  • Simply point us at your WordPress website .
  • We'll spider your website content.
  • Download your sitemaps from our website and 
  • upload them to your WordPress account.

Don't forget that by default the WordPress dashboard limits the type of files you can upload. Not all hosting providers support all file types. Your system administrator can update WordPress to allow you to upload XML and HTML sitemaps to your account. Alternatively you can use a plugin that enables richer file upload capability.

Baidu, Yandex sitemaps and fixes

Monday, 4 August 2014

Our sitemaps have always been compatible with Yandex and Baidu. We've now added them to the ping feature for RSS sitemaps so that you can easily notify them about your sitemap updates. If you have other services that would be useful please feel free to get in touch.

Some users were experiencing problems with the "download all" zip file containing empty files. We believe that issue is now resolved.Please let us know if you experience any more problems.


What is the best sitemap format?

Tuesday, 15 July 2014

Our sitemap generator produces a number of sitemap formats including HTML, XML and RSS, but which is the right one? The answer is all of them depending on the circumstances.

Here are some quick thoughts on why you might use each format.

HTML Sitemaps

 

An HTML sitemap was traditionally to help users find pages in your website, but they are also great for helping search engines find your pages organically. This is particularly true for websites with deep complex structures.

The HTML Sitemap can be linked relatively high up in your site structure and provide a direct route in to pages within your website no matter how deep they are in your website structure.

 

XML Sitemaps

 

XML (Extensible Markup Language) Sitemaps are human and machine readable although they are generally more targeted to search engines. XML sitemaps are a great way to provide information beyond a list of links to the search engine, including details about when and how often pages are updated and how important pages are. They provide search engines more hints about the content of your website.

XML Sitemaps are supported by a number of the major search engines including :

 

RSS Sitemaps

 

RSS uses XML as well, but contains different content to an XML Sitemap. RSS sitemaps include more narrative about the content of the page including the Title, Description and when it was last updated.

Many search engines also support RSS sitemaps, but webmasters tend to favor XML sitemaps when working with search engines, however RSS sitemaps can prove a very useful tool in your SEO toolkit.

RSS sitemaps are generally more widely accepted for blog searches and directories because RSS is primarily used for content syndication. The result is ping tools such as  our sitemap generator ping tool can automatically distribute them to more services.

This means you can potentially distribute your website updates more quickly and widely than with an XML sitemap. Due to the narrative nature RSS sitemaps can also help with external link building.

Summary

 

Each sitemap format has its role and can be equally valuable. With all formats being easily created using our sitemap generator it makes sense to get the maximum benefit by using as many sitemap formats as possible to distribute your website content and updates.

Sitemaps and multiple domains

Tuesday, 1 July 2014

We sometimes get asked by people why our sitemap generator doesn't find all pages in their website. There can be a number of reasons, however one reason we have found is references to multiple domains within the same website structure, in particular homepages that reference a different domain to the one the user specifies for their sitemap..

We recommend you are consistent with your domains. Pick a primary domain and stick to it. If you have secondary domains by all means use them, but make sure your website structure uses the primary domain. this will make it clearer to our spider and search engines where your pages are and the structure of your site.

A concrete example of this is if you have 2 domains pointing to the same website and use full absolute links in your page, avoid mixing the use of domains and where possible just use the relative path.

e.g. if you have mysite1980.org.uk and mysite1980.org pointing to the same site, avoid doing this :

<a href="http://www.mysite1980.org.uk">Home Link 1</a>
<a href="http://www.mysite1980.org/aboutus">About</a>
<a href=""http://www.mysite1980.org.uk/features">Features</a>
<a href=""http://www.mysite1980.org.uk/contact">Contact us</a>


We also see some website framing another site. We assume people do this to masquerade the site under another address. The best way to do this is using a DNS CNAME or HTTP 301 redirect depending on your circumstances and need.

If you frame one domain in another our spider wont recognize the two domains are the same website.

Remember it is perfectly acceptable to have more than one sitemap, one for each domain / website, but where the domains all point back to the same website you should make sure you have a good HTTP redirect strategy or make use of canonical URLs to ensure that users end up in the correct place, and that search engines don't penalise you for duplicate content.

And of course if you don't supply the correct address to the sitemap generator you risk it not being able to find some or all of your pages, if your canonical urls and redirects aren't in place.


What is a canonical page?

Saturday, 21 June 2014

A canonical page is the preferred URL of a page or set of pages with duplicate or highly similar content.

By adding rel="canonical" to your pages you can tell search engines about the preferred content / page for a given page or set of pages ....

<link rel="canonical" href="http://xmlsitemapgenerator.org"/>

 

Why use it?

The rel="canonical" element helps search engines reduce duplicate content. To help de-duplicate your content and prevent yourself from being seen to publish duplicate content you should include a canonical url for similar/the same pages.

 

When to use it

  • Sites with dynamic URLs which can lead to the same content
  • eCommerce sites, especially on product listings where the same content may be displayed in various sort orders or products with multiple pages for minor variations (e.g. colour)
  • Syndicated content to point to the original content.

 

When not to use it 

  • If your page has moved or there is a preferred version your first choice should be a HTTP 301 redirect. This not only tells the search engine the correct page, but will also redirect the user to the correct version of the page. 
  • This includes if you are changing your website structure, use HTTP 301 redirects. 

 

How we use it

The sitemap generator only includes your primary page. If we discover a page with a canonical attribute we will drop the current page with the canonical URL and move on to the URL specified in the canonical element.



Filter and save your sitemap

Sunday, 15 June 2014

In this latest release we've added features to make it easier for you to filter out urls and images you want to exclude from your sitemap using pattern based matching as well as the ability to save your sitemap settings.

Filtering

When using "more settings" option you can enter patterns of URLs to exclude for files and images.

Use can use simple filter strings as illustrated below, or more complex regular expressions.
  • /images/Temp/*
  • /TempFiles/*

Sitemap filters

Saving sitemap settings

Another handy feature we have added is the ability to save your sitemap settings. If you've spent a lot of time and effort setting up the more advanced sitemap settings you don't want to have to re-enter them every time you update your sitemap.

Save Sitemap


You'll then be asked to login or create an account. Your sitemap settings will be saved to your account so you can edit and re-run them again when you update your website.You can add multiple sitemaps to your account to enable you to manage them all in one place.


You can then edit and re-run your XML Sitemaps when you update your website.

Give it a try and generate a Google XML Sitemap.

What happened to the advanced sitemaps feature?

Saturday, 31 May 2014

The advanced sitemaps feature was one that allowed you to generate your sitemap and then edit it online. This feature was a beta prototype which we discontinued some time ago.

We embarked on a revamp of our website and spidering engine some time ago and maintaining the advanced editor along side the new code became inefficient. We took the decision to focus on our key services and get them right.

We are about to introduce more advanced settings and saving of sitemaps again and will be adding new editing features shortly.

Sorry if this caused you inconvenience, however we believe it is for the long term good of the service.


More XML Sitemap Settings

Tuesday, 13 May 2014

Did you know that if you choose the "More settings" option you can get more control over how our spider will index your website and generate your sitemap?

Give it a try and find out for yourself!





How do meta robots noindex and nofollow influence my sitemap?

Sunday, 4 May 2014

We try our best to process urls in the same way search engines do. This article gives a quick overview of how the meta robots tag influences your sitemap

index , nofollow


Applies to :
<meta name="robots" content="index, nofollow" />
<meta name="robots" content="nofollow" />

We include this page in your sitemap, but do not follow any of the links on it.


noindex, follow


Applies to :
<meta name="robots" content="noindex, follow" />
<meta name="robots" content="noindex" />

We do not include this page in your sitemap, but we follow any of the links on it. 


noindex, nofollow


Applies to :
<meta name="robots" content="noindex, nofollow" />

 We do not include this page in your sitemap and do not follow any of the links on it.

others


Any other values do not influence the sitemap and pages will be processed and included as normal.


Improved HTML sitemaps format

Wednesday, 30 April 2014

We've improved and simplified the HTML sitemap feature.

Your pages will now display in a simplified hierarchy based on the order in which our spider finds them, instead of the flat list.

We also removed the URL to keep the list simpler.



New responsive XmlSitemapGenerator website

Sunday, 27 April 2014

If you are a regular user of XmlSitemapGenerator.org you'll notice that we've made some  changes to our website. The big change is the responsive design to work better across different screen sizes and as a result you now get much friendlier look and feel even on a phone.


Before Mobile
After Mobile
Before Browser
After Browser
Achieving this kind of functionality is surprisingly easy. We use the Zurb Foundation Framework and we would highly recommend you check it out!



Do XML sitemaps improve SEO (Seach Engine Optimisation)?

Wednesday, 16 April 2014

There is a great divide within the SEO community on the value in submitting a sitemaps to search engines.

One thing is clear if you have a poor website navigation structure it  wont help search engines find your pages, but your users may not be so impressed!

So why might webmasters consider creating a XML sitemap?
  • It could help with search engines figuring out canonical URLs, although a http 301 redirect is often better.
  • The last modified date field in the sitemap file will help search engines understand which pages are changing and when, faster than an organic crawl of your website.
  • Information about the pages such as priority and change frequency can help search engines understand and target relevant areas of your website.
  • They can also help ensure the validity of links and urls in your website as part of the generation process.
As with  SEO itself there are some grey areas subject to fierce debate, yet there is certainly some validity to using XML sitemaps. For Google and others to maintain systems to support them does suggest they hold some value and they offer some support when using Google and Bing webmaster tools.

Perhaps most significantly with a free XML sitemap generator, why not at least give it a try and measure the results yourself?

Sitemap generation performance issues

Sunday, 13 April 2014

We would like to apologise for the recent variable performance over the last few days. We believe we are now starting to get it back under control....

The recent improvements created some additional problems when dealing with slow websites and hung connections. Some websites started locking up resources that were not being released quick enough  to serve other sitemap requests.

To address this we have added a number of controls to the existing list which are summarised below.

ControlStatusDescription
Max of 1000 pagesExistingThis was  recently increased from 250 to 1000
Page size of 80KUpdatedThis was 100K, but to improve performance we had to bring it down. The majority of websites should be unaffected by this.
Response time 30sNewYour website must respond to a request and send all of the page content within 30 seconds. For the majority websites this should not be an issue.
3 concurrent downloadsNewThis has been increased from 1 to 3 to improve sitemap generation speed.

We are continuing to monitor the situation and make further improvements over the coming days.

Faster 1000 page XML sitemaps and still free!

Monday, 7 April 2014

1000 pages 


We are please to announce a new milestone! We have increased the sitemap file limit to 1000 urls and best of all the service remains free to all.

To help enable this we have also been working on the system performance and have introduced a couple of new features to aid this transition.

Faster parallel page processing


We now scan ahead and download multiple pages in parallel so that the spider engine is not kept waiting for pages. this means we can process pages faster. This will put a slight increase load on your server, but this should not be a problem for most websites. If you wish to disable the feature simply click on advanced settings and un-tick the "Multi threading" check box.


100Kb max page size


One limitation we have introduced is on web page file size. The majority of users will not be affected by this, however any web pages that exceed 100K will only be partially downloaded to protect resources. If your pages are over 100Kb we would strongly urge you to fix this as this is very large indeed and even 50Kb would be excessive especially in the mobile world.

Fixes

There was an issue with processing iFrame urls which has now been fixed.

We've also done some work to help prevent our spider tripping over bad HTML as this is where most of our low ratings come from.


Improved HTML spidering (Robots, Canonical, Rel)

Friday, 28 March 2014

We now now parse a number of HTML elements to better understand your website and which files should be in your sitemap.

Canonical urls


We now detect the link rel="canonical" tag.

 <link rel="canonical" href="http://xmlsitemapgenerator.org"/>

Where we detect this tag and it points to another page we will not include the current page in the sitemap and will instead spider the url specified in href attribute of the tag.

Meta robots 

 

We now obey the meta tag for robots.

<meta name="robots" content="noindex, nofollow" />
<meta name="robots" content="noindex" />
<meta name="robots" content="nofollow" />

Where  a noindex or  nofollow value si detected we will not index or will stop following urls on the given page.

Anchor rel attribute

 

We not obey rel="nofollow" in anchor tags

<a rel="nofollow" href="/index.aspx" />

As with the meta robots tag if we detect a nofollow value we will not follow this url.

 


Text, HTML sitemaps, Robots.txt and more

Sunday, 23 March 2014

This version includes some new updates that people have been asking for including a new number of sitemap formats.

We recommend that you have a valid robots.txt file, an xml sitemap and an HTML sitemap in your website root folder to optimise your sitemap coverage.

XML and HTML Sitemaps, and Robots.txt

HTML sitemaps

The great thing about an HTML sitemap is that when you publish it any search engine can deal with it whether they officially support sitemaps or not. At the minute we list out all urls in alpahbetical order. If you have any suggestions to improve this please feel free contact us.

Text Sitemaps

We also added text sitemaps to the list of files which is a really simple list of URLs.

Robots.txt

We also produce a robots.txt file for you which you can upload to your website (or use to modify your existing robots file).
The robots.txt file makes it easier for search engines to automatically discover your sitemaps.

Other changes

We've also improved the error report and changed it from an XML file to a standard HTML page to make it easier to understand and refined our quick guide to publishing a sitemap.


How long does it take to generate my XML sitemap?

Friday, 14 March 2014

Our spider can sometimes take a while to process your website and people ask how long they should wait. The time to generate a sitemap can vary quite dramatically from a few seconds to over 10 minutes and this can be influenced by a number of factors.

 

Key Influencing Factors

 

Your website performance

The key limiting factor will usually be how fast your website / web server can respond to our spiders requests. If your web server is on the other side of the world and/ or slow to respond this will delay our spider. Your website may seem fast to you if you are geographically close to it, but our spider may not be.

 

Size of pages

Our spider has to download and process every page to find links. The smaller and more efficient your pages the fast we can spider. If your pages are large it will take longer to download files.

 

Number of pages

The more pages you have on your site the longer it will take. So if you have a slow server or lots of larger pages, the more pages you have the longer it will take.

For example if you have 10 pages that take 5 seconds it will take at least 50 seconds to complete. If you have 200 pages and it takes 2 seconds for each it will take at least 400 seconds (6.7 minutes).

 

Our service load

During busy times our spider may be indexing lots of websites and we have limited CPU, memory and bandwidth. This can cause the service to run slower.

 

Recommendations

 

Try to ensure your web pages are well designed and as lean as possible to aid the spidering of your website.

Ensure that you are not throttling the browsing of your website for our user agent
"XmlSitemapGenerator - http://xmlsitemapgenerator.org"


Most importantly when you generate your XML sitemap ensure you enter your email address. That way if there is a delay you will be notified when your sitemap is ready by email.


Home page redirects fixed

Tuesday, 11 March 2014

Some homepage redirects were causing problems for our spider and resulted in sitemaps with no files for a small number of users. We believe this is now resolved. Thanks for the feedback.

Why do you limit the number of URLs in a sitemap?

Sunday, 9 March 2014

We get asked this quite a lot.....

The main reason is that XmlsSitemapGenerator is a free tool and generating sitemaps is not a free process.

Our spider indexes thousands of pages a day utilizing lots of server resources (Memory, CPU and bandwidth) and  racking up gigabytes of data as it indexes pages and builds up profiles.

The overhead of the spidering process is quite large especially at busy times when we have many people generating sitemaps. To help ease the pressure we have a queuing mechanism to avoid a log jam on the server, however that means people then have to wait.

Therefore to control resource utlisation and minimize wait times we limit the number of urls our spider will crawl for a given website.

Over time we have increased the number of urls we accept from 50, to a 100 and at the time of writing we now support 250. Of course we constantly review this and may decide to raise or lower it in the future.

From reviewing our statistics 99% of users who use our free xml sitemap generator don't hit this limit and many don't even come close, so for now we are happy that the tool is fit for purpose.

 

What if I have more urls?

Generally if you have a website that comprises of more than 100 pages you are probably using some sort of content managment ystsem (CMS) or an ecommerce system. With such systems there are much better ways to create sitesmaps faster and more effectively directly from the database. Many systems support adaptes and plugins to help you generate your sitemap.




Improved download and error reports

Sunday, 2 March 2014

We've made some improvements to the sitemap download page to make it easier to download your sitemaps.
  • Firstly we've made all the files available as a single zip file download. 
  • We have also added a simple table that gives you access to your XML Sitemap, RSS sitemap and a
  • New error report.



Problems creating your XML sitemap?

Sunday, 23 February 2014

Some users experience problems when they create a XML sitemap because of how their website is implemented and hosted. Here are some common problems with websites that result in inconsistent xml sitemaps.
  • Server / performance issues
  • Inconsistent urls / domains
  • iFramed homepage
  • Bad header tags 
  • Incorrect server tags
  • Page size too large
  • No "real" links / Non native behaviors
  • Inconsistent behavior for different user agents / browsers
  • Poor HTML mark up
  • Incorrect use of character sets
  • Incorrect modified date header


Server / performance issues


Some webservers are slow to respond, throw errors, or crash. Your webserver needs to be responsive and be able to serve pages quickly. We limit the amount of time our spider will wait for a given page and the total time to create a sitemap. If your server is slow or producing errors our spider will fail to complete your sitemap.

Inconsistent urls / domains


Our spider only covers the current domain. If you have multiple domains in your website our spider will not follow them. This can be subtle in terms of http vs https and addresses with or without www.

<a href="http://xmlsitemapgenerator.org/index.aspx">home</a>
<a href="http://www.xmlsitemapgenerator.org/about.aspx">about</a>
<a href="http://www.xmlsitemapgenerator.org.uk/contact.aspx">contact</a>

Always use the same domain throughout your website, even better use relative urls :

<a href="/index.aspx">home</a>
<a href="/about.aspx">about</a>
<a href="/contact.aspx">contact</a>

This goes for links, framesets, image maps, etc as the sitemap spider works within the context of one domain so will ignore urls in other domains.

iFramed / framed pages


Problems with iFrames are usually related to inconsistent urls/domains as above.

e.g. if your website is https://xmlsitemapgenerator.org and on your homepage you have an iFrame  as below our spider will not get beyond your homepage.

<iframe src="http://www.xmlsitemapgenerator.org"></iframe>

To resolve this ensue the domains match or use relative urls. If you cannot do this use the address from the iFrame to create your sitemap.

Bad header tags


If you use the canonical url meta tag or noindex / no follow make sure you use them correctly. we have seen cases where peple have a noindex/nofollow on their homepage! Our spider will ignore pages with a nofollow / noindex and this is generally very bad for SEO!

Similarly a bad or circular canonical url send our spider round in circle until it gives up.We essentially treat canonical tags as a redirect.

Incorrect server response


To include a page in your website it must return a 200 or 206 success code. If it returns anything else it will be excluded from your sitemap.

Misconfiguration or server errors are usually the cause of problems. e.g. issuing a 404 incorrectly, a bad 302 redirect or a server error such as 500.

Page size to large


For performance reasons we limit the size of content that our spider processes per page. If your page is very large we will truncate this. If we truncate it before finding any urls the spider will fail to get beyond the given page.

This usually occurs if you have a lot of header content such as embedded CSS and javascripts.It's generally good practice to separate these out to enable better caching and reduce page sizes. e.g.

<link rel="stylesheet"  href="myCssFile.css"/>
<script type="text/javascript" src="/myJavascriptsFile.js" ></script>

No "real" links / non native behaviours


Some website use flash and JavaScript for links. Remember spiders and search engines struggle with these and they are not particularly accessible to users with disabilities. You should always cater for native link behaviour, especially within your home page to ensure that all visitors and spiders can find their way in to your website.

These problems can occur when using some JavaScript frameworks that render pages or manage navigation outside native HTML, "a" tags and href attributes.

You should also follow best practise and native behaviour for carrying out common tasks. For example if you want to do redirects use the correct meta tag or response headers. Using patterns such as document.location.href = 'index.asp' will prevent our spider finding your pages.

Poor HTML mark up


Not all web design software and indeed web designers create high quality HTML. Some basic tips include :

Always enclose your HTML attributes in quotes.
<meta name="description" content="something" />

Don't use unnecessary spaces in markup:
<meta name = "description" content = "something" />

Don't forget to include closing tags
<a href="/contact.aspx">contact</a>
<div>Hello World</div>

 Use the correct syntax for urls. We quite often see things like this :
<a href="///contact.aspx">contact</a>

Always use the same domain throughout your website, even better use relative urls :

<a href="/index.aspx">home</a>
<a href="/about.aspx">about</a>
<a href="/contact.aspx">contact</a>

You can use a free HTML / XHTML validator to help check your HTML.
http://validator.w3.org/

Inconsistent behaviour for different user agents / browsers

We've noticed that some websites redirect browsers, but don't redirect our spider or vice versa. For example when we visit your website in our browser you redirect to a page or folder and your server presents a page to our spider. Some users are not aware of this page or had forgotten about it and the links are often broken or out of date.

Similarly links and content can be presented differently. Always treat our spider as you would a browser to get the sitemap you expect.

Note you can detect our sitemap spider from our user agent string :

"XmlSitemapGenerator - http://xmlsitemapgenerator.org"


Incorrect use of character sets

Make sure you are using the correct response header character set. This is particularly important if you are using non Latin character sets including Arabic, Chinese, etc.
http://www.w3schools.com/tags/ref_charactersets.asp

 

Incorrect modified date header

Make sure your server response includes the correct modified date for a page. Some just return the current date and type which results in an incorrect sitemap and can mean it takes longer to spider your website.

 

Conclusion 

Our spider trusts your server response, the headers, meta tags and page content.

Adopt standards and be clean and consistent in your approach to ensure that our spider, users and search engines can navigate your website effectively.

The cleaner your website implementation the more effective it will be and the easier it will be to create an XML sitemap.

Improved support for character encoding and redirects

Saturday, 15 February 2014

Character encoding

We've improved our spider so that it can cope with a wider range of character sets including Arabic and Chinese.

Don't forget that for this feature to work correctly it is important that we can understand your website encoding otherwise our spider won't interpret it correctly and your sitemap will contain strange characters and symbols.

http://www.w3schools.com/tags/ref_charactersets.asp

Improved HTTP 301 redirect and 302 Moved handling

Not only do we now follow HTTP 301 and HTTP 302 automatically. As well as this when the spider first hits your home page if we detect a 301 or 302  we will work out what the base domain for your website is.

Coupled with the improved patter match (see below), we can now determine your primary domain and match more variations to come up with a complete xml sitemap with canonical urls.

Improved pattern matching

If you enter a single domain we will now check for more patterns :

http://www.xmlsitemapgenerator.org
https://www.xmlsitemapgenerator.org 
http://xmlsitemapgenerator.org

https://xmlsitemapgenerator.org

When we create your sitemap we will use the version that you entered in the settings so make sure this is the correct primary domain for your website.

Fixed https validation

You can now enter an https:// address without getting an invalid URL error.

We also now try to handle badly constructed urls better such as :
xmlsitemapgenerator.org///directory/test.html

Spider Performance Improvements

Thursday, 13 February 2014

Our spider was looking a little tired! At times some users were waiting a while for it to complete its job and in some particularly busy periods getting timeout errors.

The good news is we have done some house keeping, clearing out millions of records, re-building, de-fragging, etc. and we are now ticking over a bit more smoothly.

Don't forget if you are having problems you can always contact us.

Support for http-equiv="refresh" added and more ....

Sunday, 9 February 2014

On the 9th of Feb we made some minor updates.....

New : Follow meta refresh tag e.g. http-equiv="refresh"


We found quite a number of websites using meta tags in their homepage to redirect to another page. Previoulsy we were not detecting this so our spider only found a hompage.

We now take the http-equiv tag url and follow it. e.g. : 

<meta http-equiv="refresh" content="0; url=http://example.com/">

New : Automatically follow both www. and non www. urls.

Some users were not understanding what domain they were using and where they were specifying the full URL within pages mixxing the www and non www equivelent.
e.g. http://www.xmlsitemapgenerator.org and http://xmlsitemapgenerator.org

If you enter one we now follow both by default but we don't duplicate your urls. We list them once using the domain you entered on the setup page.

Fix : Errors with duplicate urls for frames.

We noticed that when there were frames on a page with the same url it was causing a spider problems and throwing an error so we fixed this!

Happy new year

Wednesday, 1 January 2014

Happy new year from the XmlSitemapGenerator.org team.

We wish you lots of success with your website in the new year!

Blog Archive