11 Replies Latest reply on Oct 1, 2010 5:20 AM by Adam Cameron.

    404 Handler loses FORM scope

    Owain North Level 4

      Howdy people, hoping someone can help me with this as I can't find anything previous.

       

      Having only ever really written company internal systems, I've never really been that fussed about SEO. However I'm now working on a site where it's very important, so went about creating a handling page which would catch *any* page, figure out what you meant and using getPageContext().forward() to get the actual content. I've always known the theory behind doing so, but have never really gotten this far into the gubbins of it all.

       

      Now what probably doesn't help is that I'm trying to be (maybe too) clever - I don't want Google and the suchlike flagging up "duplicate" content by having, for example, the site accessible by http://www.mysite.com/ and http://www.mysite.com/index.cfm. I appreciate that in theory this only requires me to not ever create links to the .cfm files, but sooner or later I will, or someone will figure one out, etc etc. Therefore I've tried storing all my .cfm pages *above* the web root, then using CF to get and include the content. Essentially, no pages actually exist (except the 404 page and 404 handler) - IIS routes all 404's to my CF handler page, as does my onMissingTemplate method (although this should never get hit if there are no .cfm pages).

       

      However, I no longer have a FORM scope. Googling around seems to have revealed the issue - at some point recently IIS changed the way it calls your 404 handler - it now uses a separate thread to the original request, and so your submitted FORM values never get to CF.

       

      So, my questions are:

       

      1. Is there anything I can do about this? Have I jumped to the wrong conclusion?

       

      or probably more sensible

       

      2. Have I completely wandered so far down a dark pointless alley that I can no longer see the wood for the trees, and so have come up with a massively convoluted system which is far more complex than it ever needed to be?

       

      I'd suspect the latter, but I'm sure I'm not the first person to want a site to work in this way. I apologise for a post on the boring old subject of SEO but hopefully this is a little more in-depth than the standard "how do I SEO my site" question. As I say I've got everything working - examining the URL, figuring out which page to go from an XML document and even translating the url into variables which I pass to the included pages. Everything works, except I have no FORM scope.

       

      If anyone can point me in the right direction, I'd be most appreciative.

       

      O.

        • 1. Re: 404 Handler loses FORM scope
          Adam Cameron. Level 5

          2. Have I completely wandered so far down a dark pointless alley that I can no longer see the wood for the trees, and so have come up with a massively convoluted system which is far more complex than it ever needed to be?

           

           

          I think you have.

           

          However I'm now working on a site where it's very important, so went about creating a handling page which would catch *any* page, figure out what you meant and using getPageContext().forward() to get the actual content. I've always known the theory behind doing so, but have never really gotten this far into the gubbins of it all.

          I'm not sure I quite follow.

           

          Are you saying you're going to have a URL like http://www.owain.co.uk/some/path/to/a/file.cfm and you are going to 404-handle it, and redirect it to http://www.owain.co.uk/index.cfm which is then gonna look at the /some/path/to/a/file.cfm and work out that you need to include x.cfm, y.cfm and z.cfm etc?

           

          You don't want to be doing that with a 404 handler.  You want to be doing it with mod-rewrite (or isapi_rewrite, or whatever).

           

          Get the web server to route the SEO friendly URL to the correct CF-ready URLs.  don't get CF to do it.

           

          --

          Adam

          • 2. Re: 404 Handler loses FORM scope
            Owain North Level 4

            Hi Adam

             

            Cheers for the reply, it's much appreciated.

             

            No, I'm not saying I'd redirect /site/page/mypage.cfm to /myother/page/index.cfm, I'm saying for example that I'd redirect mysite.com/person/398473/dave-smith-barry/muppet/anything to mysite.com/person.cfm?personid=398473, ie being able to fill the url with any old SEO-friendly garbage I see fit.

             

            I don't want *any* of my public URLs to contain .cfm preferably, because I think it looks a bit crap, especially if it's not at the end. However if they don't contain .cfm they don't get passed to onMissingTemplate(), hence no FORM scope.

             

            Cheers

            O.

            • 3. Re: 404 Handler loses FORM scope
              Adam Cameron. Level 5

              Yep, well in that case it's def a job for mod_rewrite.

               

              --

              Adam

              • 4. Re: 404 Handler loses FORM scope
                Owain North Level 4

                Hi Adam

                 

                Whilst, again, I've never actually used mod_rewrite I'm aware of what it does - however I don't know whether it does "masked" redirects or whether it would actually end up changing the url bar?

                 

                I had assumed it would simply redirect, if it doesn't then that's almost certainly what I'm after.

                 

                Cheers

                Owain.

                • 5. Re: 404 Handler loses FORM scope
                  Adam Cameron. Level 5

                  mod_rewrite does a few things, the relevant two are:

                  * redirects one URL to another (by, like, doing a 301 redirect).  Not what you want in this situation

                  * rewrites a URL internally, which is exactly what you want.

                   

                  Say you have URLs thus:

                   

                  http://www.owain.co.uk/product/1234/

                   

                  You could have a rewrite thus:

                   

                  RewriteRule ^/product/([0-9]+).*$ /index.cfm?product=$1

                   

                  The web server will then treat the URL as /index.cfm?product=1234, which gets passed on to CF because it's a CFM page (in the way it normally would).  The client doesn't see any of this, and is none the wiser.

                   

                  mod_rewrite (and its isapi counterparts) is very powerful & flexible, and has very little overhead.

                   

                  --

                  Adam

                  • 6. Re: 404 Handler loses FORM scope
                    Owain North Level 4

                    Oh man, so after all my pain I could've just used a rewrite engine to start with? Someone punch me in the face, hard. Luckily I've just been working on a Perl project, so my head is only thinking in terms of regular expressions at the moment.

                     

                    Cheers muchly Adam, I'll get right on it.

                     

                    O.

                    • 7. Re: 404 Handler loses FORM scope
                      Adam Cameron. Level 5

                      I'd never used mod_rewrite (or in our case Helicon's ISAPI version) until my current job, at which I've been for four months or so now.  But because I'm reasonably good with regexes, I've kind of inherited the management of our .htaccess files.

                       

                      A coupla tips that might be handy for a newbie to save you the pains I went through:

                      * investigate rewrite maps from the outset.  They're bloody handy.

                      * when one has a RewriteRule and a bunch of RewriteConditions, eg:

                       

                      RewriteCond TestString CondPattern
                      RewriteCond TestString CondPattern
                      RewriteCond TestString CondPattern
                      RewriteRule Pattern Substitution

                       

                      Despite how it looks, the RewriteRule PATTERN is processed first, and then each of the conditions from top to bottom, and then finally the Substitution.  It's not logical and it's also not immediately apparent from the docs (well at least my reading of them).  Processing will stop for that rule as soon as a pattern doesn't match.

                       

                      * A rewritemap substitution can only be used in Substitutions and TestStrings.  Not in Patterns or CondPatterns.

                       

                      * In at least Helicon's implementation, one cannot have a comment on the same line as a RewriteMap statement, eg:

                       

                      RewriteMap mapLabel txt:mapfile.map # some helpful comment here

                       

                      On any other line, one can comment like this NP.

                       

                      HTH.

                       

                      --

                      Adam

                      • 8. Re: 404 Handler loses FORM scope
                        Owain North Level 4

                        Yes mate that helps a lot.Irritatingly this'll be on a Server 08 box, so it'll probably be using IIS7's own rewrite engine. However I do believe that's not dissimilar to Helicon's offering, so hopefully should be okay.

                         

                        Guess I best start Googling

                         

                        Cheers again

                        O.

                        • 9. Re: 404 Handler loses FORM scope
                          Adam Cameron. Level 5

                          From what little exposure I've had to it IIS7's rewriting ain't bad.  I think one needs to create the rules via the UI though - rather than use a .htaccess file - which might limit its flexibility, but for the stuff you're needing to do it'd do just fine.

                           

                          --
                          Adam

                          • 10. Re: 404 Handler loses FORM scope
                            Owain North Level 4

                            Just had a play with the IIS7 engine - turns out you can create rules yourself, you just need them in a web.config file in the web root which lists them in the right syntax.

                             

                            Definitely much easier than doing it in CF mind, with the only downside that I kinda like the idea of my apps being completely self-contained, but in this case I think the benefits massively outweigh the downsides anyway.

                             

                            Cheers again Adam.

                            • 11. Re: 404 Handler loses FORM scope
                              Adam Cameron. Level 5

                              Cool.  No worries mate.

                               

                              --
                              Adam