5 Replies Latest reply on Apr 17, 2006 12:02 PM by Robert_Oakland

    Multiple CFHTTP and improving performance

      I've constructed a light-weight federated search engine for our library. Right now, I've adopted using the cfhttp screen scraping approach to get at the data. I then parse the data, clean it up a bit, and then output the cleaned data in a standardized format including direct links to the full text resources (proxied and protected, of course!). I'm looking for ways to speed up my performance. None of the resources I'm access have SOAP or XML interfaces (grr!). Several do support a pretty library specific protocol that _might_ perform faster (no guarantee - it's an old protocol that is soon to be replaced - Z39.50) but I can't find any direct support for this in Cold Fusion (I'm on 6.1). I think if I upgrade to 7 I might be able to use the gateway or at least write a port based z39.50 interface if I must. But, before I do anything that drastic I was wondering if any had any tips on speeding up cfhttp calls. I've looked at existing tips (don't use resolve URL, use IP instead of domain name) and have applied those where possible. What I'm really hoping for is information to see if there is a way to thread or fork process/functions/objects within ColdFusion. Right now, the major drain on my cfhttp calls is, of course, waiting for the data back and forth over the internet. This is especially true of one database that requires about a ten page back and forth handshake to read and set all the cookies it wants before getting into the search screen and restuls sets. I am currently accessing and scraping 3 sites, and would like to add 2 more. Under the current 3 sites, two requires 3 cfhttp to get to the results set, and one requires 10 (yikes!!!!). So I'm looking at 16 cfhttp calls. It takes the process about 15-25 seconds to finish now. If I could fork or thread these process, at least for the main resources (I've written up the code to access each as a cfobject) I'm anticipating seeing that time cut down to about 7-15 seconds...
        • 2. Re: Multiple CFHTTP and improving performance
          Robert_Oakland Level 1

          Thanks, very nice! It looks like it will do what I want. Before I hop on board with this approach though (especially since I am on CF 6 and they warn that this might not "always" work on CF 6, and they can't explain or figure out why), I wanted to ask about the CF 7 asynchronous gateways. I wanted to know because it looks like it would be fairly trivial to modify my CFC into gateways, and then I could invoke them in order (umm, or whatever CF7 refers to this as, I'm still on 6.1 . The CFC store the data in server session variables, so using this method would still allow me to access the data these would create (I think). The major issue with cfhttp as I see it (and the one the excellent tool you posted fixes) seems to be less to do with the underlying speed of execution of the JAVA code that cfhttp runs on, but the fact that it can't be asynchronously invoked/called, correct? If that is so, wouldn't using asynch gateways solve my problem (well, and upgrading to CF 7 to get at them, but I'm looking at some other cool gateway integrations already as well...)? Has anyone had a chance to do any comparison between using cf7 asynch gateway for asynchronous cfhttp retrieval versus CFX_HTTP5? I hope the performance difference is small to none, because recoding for gateways will be a lot quicker and cleaner than recoding for CFX_HTTP5 (especially for grabbing and sorting cookie data, and then passing it back, which all these sites require and there is the one that does about 8 cookie handshakes- gack!)
          • 3. Re: Multiple CFHTTP and improving performance
            jdeline Level 1
            You might want to contact Andrei Kondrashev, the author of the CFX, and discuss your problem. He is quite knowledgable about this kind of stuff.
            • 4. Re: Multiple CFHTTP and improving performance
              1. Async gateways based on idea "shoot and forget", i.e. do not provide (at least not directly) feedback (results) to your application. So, they are rather "gateways" than "asynchronous". All attempts to implement asynchronous processing within the same CF page using AG I saw so far, are too heavy, contain too many moving parts, and loose all the performance gain on that result return stage (writing and reading files, databases, etc). So, it makes sense, if you want to save minutes, not milliseconds. If your application just collects data [in database] and does not need to construct results on-the-fly, AG will work for you.

              2. CFX_HTTP5 implements real physical multithreading (CF threads are not involved). Results are returned back in a natural way, as all CF tags do. Even a single HTTP call can be faster than under CFHTTP (if your target supports gzip, for example, you can get data MUCH faster, since the tag will decompress them for you). When using simultaneous execution (especially when requests go to different sites), gain is obvious and simple: your total HTTP time is the time needed to execute the longest request. Almost linear function.

              3. CFX_HTTP5 natively supports "sessions" (includes security context on both client and server). So you do not need even think about cookies.
              • 5. Re: Multiple CFHTTP and improving performance
                Robert_Oakland Level 1
                Thanks adiabata. I'm guessing that you are Andrei Kondrashev, the developer of CFX_HTTP5? Please disregard the message I sent you in another web site (slater@) asking about this very discussion. You jumped in and answered my question before I even got it touch with you. If I have questions about some specifics of implementing CFX_HHTP5 in a threaded manner to get the best, most efficient results for my application, would it be appropriate to discuss them in this post (or forum even) or would the discussion be better suited for a different forum or site?