Coded something up in Couch in an interesting way? Have a snippet or shortcode to share? Post it here for the community to benefit.
40 posts Page 1 of 4
Before using Couch I have used http://www.xml-sitemaps.com/ to generate an xml sitemap to submit via my Google webmaster account for each new website.

Now that my sites generally allow clients to generate new pages I guess I need to use a dynamic xml sitemap so that new pages are added to the sitemap. I've come across http://gsitecrawler.com/ - but would be grateful for any advice or recommendations other Couch users can offer on this!

Thanks.
Hi Potato,

We can use Couch itself to create a dynamically generated sitemap.

This is what a typical sitemap looks like (used http://www.xml-sitemaps.com to get this and trimmed down the entries to only two)
Code: Select all
<?xml version="1.0" encoding="UTF-8"?>
<urlset
      xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9
            http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
<!-- created with Free Online Sitemap Generator www.xml-sitemaps.com -->

<url>
  <loc>http://www.yoursite.com/</loc>
  <lastmod>2012-09-10T13:30:35+00:00</lastmod>
  <changefreq>daily</changefreq>
</url>
<url>
  <loc>http://www.yoursite.com/about-us/</loc>
  <lastmod>2012-09-10T13:30:35+00:00</lastmod>
  <changefreq>daily</changefreq>
</url>

</urlset>

It is just a XML list of pages.
The repeating element here is the <url>..</url> block that contains information about each page in the site.

If we were to generate such a list using Couch, we can create a separate template to output the XML list (let us call it 'sitemap.php').
In this template, we paste the code above.
We'll have to modify the XML part as it has some quirks that have been discussed in our docs where we create a RSS feed (http://www.couchcms.com/docs/concepts/rss-feeds.html).

To output the <url>..</url> blocks for each page, we use only one block and then enclose with cms:pages tag.

A typical Couch site consists of more than one template, so we either use the cms:pages loop multiple times specifying a different template (the 'masterpage' parameter) each time or for a more generic solution, we can use the cms:templates tag (http://www.couchcms.com/docs/tags-refer ... tes-1.html) to loop through the available non-hidden templates and feed the names to cms:pages tag.

Here is a complete working 'sitemap.php' template. You can download it and use it as-in in your site:
sitemap.zip
(562 Bytes) Downloaded 7420 times

Code: Select all
<?php require_once( 'couch/cms.php' ); ?>
<cms:content_type 'text/xml' /><cms:concat '<' '?xml version="1.0" encoding="' k_site_charset '"?' '>' />
<urlset
      xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9
            http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
           
   <cms:templates order='asc' >
      <cms:pages masterpage=k_template_name>
         <url>
           <loc><cms:show k_page_link /></loc>
           <lastmod><cms:date "<cms:if k_page_modification_date='0000-00-00 00:00:00'><cms:show k_page_date /><cms:else /><cms:show k_page_modification_date /></cms:if>" format='Y-m-d\TH:i:s+00:00' gmt='1' /></lastmod>
           <changefreq>daily</changefreq>
         </url>
      </cms:pages>
   </cms:templates>

</urlset>
<?php COUCH::invoke(); ?>

Hope this helps. Do let me know.
P.S. If you have not yet turned caching on for your site (in config.php), now is the time to do so :)
Thanks so much ... this generates what I'm looking for - very smart!

But I'm now wondering about the fact that the sitemap created is sitemap.php, rather than sitemap.xml ...

Have been searching for info on this and found http://www.daniweb.com/web-development/php/threads/240234/automated-sitemap-# - where there is reference to adding a Rewrite rule in the .htaccess file:
Code: Select all
RewriteRule ^sitemap.xml? sitemap.php

I am way out of my comfort zone here ... what do you think?

CACHE : I have never switched on caching (via config.php) - will it mean faster page displays? From the comments in the config.php file - am I correct in thinking that as soon as a page is changed in the Admin Panel the cached version of the page (if it exists) will not be served if it hasn't yet been deleted from cache i.e. it will be superceded by a freshly generated version?
I think (but am not absolutely sure) that the .php extension should not matter. The template declares its contents to be in XML format by using HTTP headers internally so that should suffice.
Why don't you try and submit this sitemap.php (or /sitemap/ with prettyURLs) to Google webmaster tools? It'll report immediately whether or not the format is acceptable. Please let us know.

In any case, we have the .htaccess method to fall back upon. The rewrite rule you mentioned should work well
Code: Select all
RewriteRule ^sitemap.xml? sitemap.php

Finally, regarding the caching - do use it on all your sites and non-admins (which is pretty much rest of the world except you) should see a perceptible difference in load times. A cached page is almost as efficient as a static HTML page.
Admins are never served cached pages so it won't hamper you in your development activity.

As soon as you modify something in the admin-panel, the cache gets invalidated and (whether or not a cached file is physically present) the next time a page is requested it is generated afresh by Couch with a copy being cached to be used for all future requests (till the cache gets invalidated again - whereupon the cycle gets repeated).
I can now confirm that the sitemap does not need to have a .xml extension -
Got this from the FAQs
Does my Google Sitemap have to end with .xml?
No, you can name it whatever you like, just make sure you are sending the correct mime type (text/xml for xml data). You can configure your Apache server to send "text/xml" for your favourite extension by adding "AddType text/xml .yourext" to your .htaccess file or httpd.conf.

Sending the correct mime type is what matters and that is done automatically by Couch.
So, no need to configure anything that is mentioned here. Our sitemap.php template should work fine as it is.
Brilliant ... :D
I have followed these steps but I seem to be getting a 404 when I try to access /sitemap. Any ideas what I am doing wrong?

Here is my sitemap.php:
Code: Select all
<?php require_once( 'admin/cms.php' ); ?>
<cms:content_type 'text/xml' /><cms:concat '<' '?xml version="1.0" encoding="' k_site_charset '"?' '>' />
<urlset
      xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9
            http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
           
   <cms:templates order='asc' >
      <cms:pages masterpage=k_template_name>
         <url>
           <loc><cms:show k_page_link /></loc>
           <lastmod><cms:date "<cms:if k_page_modification_date='0000-00-00 00:00:00'><cms:show k_page_date /><cms:else /><cms:show k_page_modification_date /></cms:if>" format='Y-m-d\TH:i:s+00:00' gmt='1' /></lastmod>
           <changefreq>daily</changefreq>
         </url>
      </cms:pages>
   </cms:templates>

</urlset>
<?php COUCH::invoke(); ?>


Here is my .htaccess file:
Code: Select all
Options +Indexes +FollowSymlinks -MultiViews
<IfModule mod_rewrite.c>
RewriteEngine On

#If your website is installed in a subfolder, change the line below to reflect the path to the subfolder.
#e.g. for http://www.example.com/subdomain1/subdomain2/ make it RewriteBase /subdomain1/subdomain2
RewriteBase /

#If you wish to use a custom 404 page, place a file named 404.php in your website's root and uncomment the line below.
#If your website is installed in a subfolder, change the line below to reflect the path to the subfolder.
#e.g. for http://www.example.com/subdomain1/subdomain2/ make it ErrorDocument 404 /subdomain1/subdomain2/404.php
#ErrorDocument 404 /404.php

#If your site begins with 'www', uncomment the following two lines
#RewriteCond %{HTTP_HOST} !^www\.
#RewriteRule ^(.*)$ http://www.%{HTTP_HOST}/$1 [R=301,L]

#sitemap redirect
RewriteRule ^sitemap.xml? sitemap.php

#DO NOT EDIT BELOW THIS


RewriteCond %{REQUEST_FILENAME} -d [OR]
RewriteCond %{REQUEST_FILENAME} -f
RewriteRule . - [L]

#blog.php
RewriteRule ^blog$ "$0/" [R=301,L,QSA]
RewriteRule ^blog/$ blog.php [L,QSA]
RewriteRule ^blog/.*?([^\.\/]*)\.html$ blog.php?pname=$1 [L,QSA]
RewriteRule ^blog/([1-2]\d{3})/(?:(0[1-9]|1[0-2])/(?:(0[1-9]|1[0-9]|2[0-9]|3[0-1])/)?)?$ blog.php?d=$1$2$3 [L,QSA]
RewriteRule ^blog/[^\.]*?([^/\.]*)/$ blog.php?fname=$1 [L,QSA]
RewriteRule ^blog/[^\.]*?([^/\.]*)$ "$0/" [R=301,L,QSA]
</IfModule>
Your .htaccess does not seem to have an entry for sitemap.php and hence the 404.
Please use gen_htaccess.php again to create a new .htaccess.

I'd like to also add that the
RewriteRule ^sitemap.xml? sitemap.php
that you have is not necessary (as discussed in this thread). Simply submit the normal URL of your sitemap to Google.
This fixed it. Thank you so much!
When I submit 'sitemap.php' in the Webmaster Tools of Google, it returned with an error:

Your Sitemap appears to be an HTML page. Please use a supported sitemap format instead.

And when I submit '/sitemap/', it says:

We encountered an error while trying to access your Sitemap. Please ensure your Sitemap follows our guidelines and can be accessed at the location you provided and then resubmit.

General HTTP error: 404 not found
Sitemap: /sitemap/
HTTP Error: 404

Any idea how we can submit your sitemap template to Google?
40 posts Page 1 of 4