CF solution to creating sitemaps

Report · Apr 19, 2007

I was wondering if anyone knew of any CFC or UDF to create a google sitemap (XML) as well as an HTML sitemap based on the XML document. I found a couple of solutions, but they are written in python and that is not an option.

I've already searched the forums to see if anyone else has the solution implemented. I also checked cflib and other sites to see if there were any UDF or CFC available for my needs. My boss would rather me find a third party solution that is complete than having me code a solution.

On a side note, does anyone know of a limitation on cfdirectory that would cause a nullpointer exception when running cfdirectory recursively on web root? I already started a simple solution before being asked to find a 3rd party solution. My best guess is a memory issue of returning a very large result set from cfdirectory. I can't find documentation, but it makes sense to me.

Thanks in advance.

Report · Apr 20, 2007

Hi,

"Jivebot" a cool online tool can get your task done for you... Try it at,

http://www.jivebot.com/beta/index.cfm

Report · Apr 20, 2007

That is a great tool, but the site has more than 500 pages. Thanks for the link.

Report · Apr 20, 2007

Hi,

Also another online utlity (which can even automatically upload the created sitemap to your ftp server and can ping google) is "PINGOAT"...

But this is not a CF based tool...

Try this at,
http://pingoat.com/goat/google_sitemap

Report · Apr 20, 2007

I've found a couple of good tools that do exactly what we need, only they aren't CF. I have to find a solution that is CF (limited to what we have installed currently) and we can install on our servers.

Report · Apr 20, 2007

Google site maps are flat structures, and standard HTML (ie: aimed @
humans) site maps are generally hierarchical. You're not going to get one
piece of code to do both. Well: I mean it's possible, sure, but it'd be
prett grim code.

Without you indicating how your site hierarchy is composed, it's pretty
hard to suggest how you should approach either task.

Howver there is NOTHING to the logic required to generate a Google site
map, so how come you don't convert the Python logic you've found to CFML?
How hard can it be?

As for the <cfdirectory> thing... how many files have you got? How deep
does the dir structure go? How much RAM? Are you *sure* it's the
<cfdirectory> statement that's causing the problem? How about posting some
code? What OS are you running on?

You're not really giving us much to go on to help you, here.

--
Adam

Report · Apr 20, 2007

You are correct, the code to produce the XML is simple and I have it working, only I get the "nullpointer" error when attempting to run it at my root with recursion turned on. I've attached the code below, which has successfully produced a site map in a smaller test directory.

The code question was just as an aside because I didn't see anyone with this issue in other posts. It works recursively on a directory with several layers deep and about 1000 files total. However, when I run it recursively on development in webroot (which contains archived directories, etc and who knows how many files) that is when i get the null pointer error. I know that I ran shell script to produce a report on all HTM* and CFM files and it number well over 10k.

Of course, I could run 2 cfdirectory - one filtering for CFM files and one for HTM* files, which might not be a memory hog. I was asked to look at pre-existing solutions before continuing development.

As far as server set up, we are running Sun One webservers w/CFMX7 - I don't know about the RAM.

As far as a solution that produces both HTML and XML site maps, well I have to prove that it isn't a simple solution before I can say it can't be done.

If I am to continue working on this solution, I will "clean it up" (i.e. it will be a UDF or CFC) and rely on exclusions in robots.txt to eliminate any iles/directories and the logic will probably have to be changed to accomplish that feature. This was just a proof of concept for my boss to see if it could be done or not in-house.

Thanks for any input.

Report · Apr 20, 2007

Is it your <cfdirectory> erroring, or is it the QoQ?

I don't think you need your QoQ, you can simply filter your query loop,
further down.

A superficial glance at your code (it's Friday evening, and we just lost
the cricket, so "superficial" is all you get 😉 suggests it's OK.

--
Adam

Report · Apr 23, 2007

Thanks for looking it over. Sometimes the extra pair of eyes catches a improper implementation of a tag.

The code works fine in my test directory (i.e. a handful of subfolders with a fair amount of files) - I get my XML file. However, when I move it up to the webroot directory and have recursive = true, I get the null pointer error. As soon as I turn recursive off, the code runs fine and I get the XML file. To test to see if it was cfdirectory, I created test code and got the same error.

The problem isn't with my code, but with how much data cfdirectory is probably returning (I am guessing). I do the QoQ to cut down on looping (because the number of files/directories are well over 10,000).

The funny part of all of this is that management wants an HTML site map more so than the Google site map and they would rather go with 3rd party rather than in-house code. Too bad, because the Google site map is easier to produce. The search continues for something that fits the bill.

If I find anything, I'll post it here so others can save hours, going on days. 🙂

Report · Apr 23, 2007

Hi,

If you want to go for a CF Specific solution then try this "Site Page Snack" tool available at,

http://googlebotsnacks.com/index.cfm/fuseaction/snacks.GoogleSnacks/Google_Bot/Snacks/Adsense/google...

Report · Apr 23, 2007

That definitely looks like it would work for the XML portion of the site map. Now all I need to find is an HTML sitemap program. :-) I actually found on that does both, but it is written in python. It takes a big feat to justify installing anything on our servers.

Adobe Community

CF solution to creating sitemaps