Search Engine - Sitemap

About

http://www.sitemaps.org/

Syntax

A Sitemap is an XML file that lists URLs for a site along with additional metadata about each URL:

  • when it was last updated,
  • how often it usually changes,
  • how important it is, relative to other URLs in the site

Simple example from the Protocol documentation (ie syntax)

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">

   <url>
      <loc>http://www.example.com/</loc>
      <lastmod>2005-01-01</lastmod>
      <changefreq>monthly</changefreq>
      <priority>0.8</priority> <!-- 0 to 1 with a default of 0.5 -->
   </url>

</urlset> 

Usage

  • Used by Web crawlers to discover pages from links within the site and from other sites.

How to advertise it

Submit to individual engine

Submit sitemap to Google at: https://search.google.com/search-console/sitemaps

Robots.txt

The sitemap.xml file can be defined in the robots.txt file.

You can specify more than one

Sitemap: http://www.example.com/sitemap-host1.xml
Sitemap: http://www.example.com/sitemap-host2.xml

Search engine HTTP request (Ping)

via an HTTP get request Ref

<searchengine_URL>/ping?sitemap=sitemap_url
# ie
<searchengine_URL>/ping?sitemap=http%3A%2F%2Fwww.yoursite.com%2Fsitemap.gz
Search Engine Endpoint
google http://www.google.com/webmasters/sitemaps/ping?sitemap='.encoded_sitemap_url
microsoft http://www.bing.com/webmaster/ping.aspx?siteMap='.encoded_sitemap_url
yandex http://blogs.yandex.ru/pings/?status=success&url='.encoded_sitemap_url





Discover More
Card Puncher Data Processing
Dokuwiki - Sitemap

generation in dokuwiki. The file is triggered by the taskrunner and generated by the class Mapper.php at DOKU_HOME/sitemap.xml.gz Generation frequency...
HTML - Canonical URL

URL A canonical url is a URL that has a canonical value identifier for a web page meaning that the value should be unique on the internet scope. This is the URL that people will see in: the search...
Google Search Console Index
Search Engine - Google Index

The google index is a search index created by the googlebot Check the GoogleSearch Console Index category: coverage - indexed or not and why sitemaps (ie ) removal See if the page is...
Search Engine - Search Index

A search index is an index of token (word) to web page A search engine query it in order to return result. It's structure is inverted index meaning that it maps word to URL (page) The search index is...
Robots Useragent
Web - Robots.txt

robots.txt is a file that control and gives permission to Web Bot when they crawl your website. Googlebot should not crawl and all sub directory All other...
Web Structured Page - PageMaps

PageMaps is a web markup format that adds arbitrary metadata to a web page. They are useful to add any data to a web page that make sense only to your web application and that you might not want to display...



Share this page:
Follow us:
Task Runner