10 Replies Latest reply on Dec 14, 2008 6:29 AM by Atul Paralikar

    Help me understand clustering and fail-over better

    DCwebGuy Level 1
      I have been in bed with CF since 1997. However, I have never worked for a company that got so much traffic that we had to worry about load balancing and fail over. Now, finally, I am working on a personal project where clustering and fail-over would be just the ticket. So I am getting to know CF8 Enterprise and clustering, but have a few high-level questions perhaps one of you can help me with. To make it easy I have bolded the actual questions. Sorry for all the blah blah in between.

      First, to help you understand my setup and what I'm trying to accomplish: I want to test load-balancing and fail-over on the grandest scale. So just the other day I sucessfully installed one instance of multi-server CF8 Enterprise on 3 separate boxes (3 could be 100, but I started with 3), and clustered them. They are all Linux/Ubuntu 7.10 on Apache2 and MySQL. Totally clean installs.

      Fyi, these "boxes" are actually "instances" running on Amazon's EC2 compute cloud, which make it rediculously cheap and easy to test all kinds of stuff and just walk way without consequence when something goes awry or I'm done testing. Since they don't have the same subnet all I had to do was plug their IPs into the jrun security config and, voila, CF cluster was talking to them all. But I digress....

      Box 1, let's call it, has my cluster, which contains 1) a single instance of itself, 2) a single remote instance from Box 2, and 3) a single remote instance from Box 3. I've made the assumption that having multiple boxes is where the real benefit from load balance and fail-over kicks in. E.g., if Box 1 goes down, Box 2 and 3 will pick up, and vice versa. Is that a correct assumption or understanding of how CF cluster fail-over and load balance work?

      Anyway, following Adobe's instructions here I then ran the Web Server Configuration Tool. When I ran the config tool, it of course already recognized I had an Apache site running called [localhost:cfusion] Apache: /etc/apache2. So I clicked "Add" anyway and selected my new cluster instance from the Jrun Server drop-down (which contained both "cfusion" and my new instance), left it set to Apache web server, put in my Apache config directory, checked the box for "Configure server for CF8 applications" (I don't understand this part, but whatever), and hit OK to restart Apache.

      Now here's one of my actual questions: after all this it comes back saying " This web server is already configured for Jrun". 99% of the time I'm sure folks will have a web server already running when the install CF, so of course CF is going to configure it. Why then did the instructions tell me to do this, or rather NOT tell me to uninstall the original Apache config?

      So at that point I ran config again, REMOVED the Apache web server, and re-added it using my cluster (followed the exact same instructions from above). This time it worked fine. But here's where it gets interesting.

      When I went back to my CF Admin for this box the Enterprise Manager nav button/control panel for managing my instances and clusters was gone! Why?

      I see at the top of CF Admin that the "Server:" is now calling itself by my new cluster name. Okay, this makes sense I guess because I just removed the original Apache config and told the new Apache to use the Jrun cluster server instance to serve up. Fine.

      So now how am I supposed to manage my cluster if I can't see the cluster manager?

      My instances on the other two boxes still have their original "cfusion" Apache configurations.

      Wanting to get to my real goal, how do I now TEST that my cluster (even though I can't see it in the CF Admin any more) will actually fail over to the remote instances I included in it? And how do I TEST load balancing?

      Whatever I learn from this experiment I am going to scale up to 100 servers and post back my results to this thread, so if anyone can help me put the final pieces of this puzzle together, it will assist me greatly!

      Final two questions: Is there any difference in the way CF8 handles clustering and load balance compared to CF7 or even CF6.1? This info will help me sort through the various blogs when looking for solutions. Since Adobe is discontinuing support for Jrun, what will future versions of CF use to accomplish clustering?
        • 1. Re: Help me understand clustering and fail-over better
          Grizzly9279 Level 1
          To give you some background on myself, I've been working with CF since 2000, and have quite a bit of clustering experience with ColdFusion 7 specifically. I have not had the opportunity to work with ColdFusion 8 yet, but I expect that the basic princples of configuring JRun are still the same.

          That being said, let me see if I can help.

          Q: "Is that a correct assumption or understanding of how CF cluster fail-over and load balance work?"
          A: Yes, that is correct. Assuming you configure your web server(s) to point to the cluster (and not individual CF instances), this is exactly how it will behave. Load should be fairly evenly distributed amongst your CF instances. In the event that one CF instance goes down, the load will automatically shift over to the other remaining CF instances. Assuming you're using J2EE session variables, user-sessions should automatically replicate to other cluster nodes, so even if your application is heavily grounded in session variables, your users should be unaffected by a cluster node going down.

          Q: Why then did the instructions tell me to do this, or rather NOT tell me to uninstall the original Apache config?
          A: This part can be confusing. You are correct that when you originally installed CF, it may have automatically detected the Apache instance and configured it for ColdFusion. If that is the case, you're going to want to FIRST use wsconfig to remove any ColdFusion configuration that currently lives in your Apache config. This is because your Apache instance is likely configured to point to an individual ColdFusion instance, and not the cluster you just created. Once you have successfully uninstalled the pre-existing ColdFusion configuration from your Apache server, you should run wsconfig again to point Apache to your cluster, and say yes to "Configure server for CF8 applications". By saying "yes" to this, you're telling wsconfig to update your Apache config for you, so you don't have to manually go in and configure the .cfm/.cfml extensions, .lib includes, etc. Generally speaking, it's best to let wsconfig do this for you.

          Q: When I went back to my CF Admin for this box the Enterprise Manager nav button/control panel for managing my instances and clusters was gone! Why?
          A: What address are you using to access the CF Admin? It sounds like you were logging into an individual CF instance of the CF Admin, and not the "base" cfusion instance. ColdFusion should have installed an "internal" web server (seperate from Apache) that allows you to access the various CF Admins by port. In JRun4, the JRun admin lives on port 8000, and the "base" cfusion instance lives on port 8300. Other instances set up beyond that started at port 8100 (by default), and increment from there (port 8101, 8102, 8103, etc) The "base" cfusion instance on port 8300 should be your go-to CF Admin for configuring and managing the cluster.

          Other notes:
          If you have multiple web servers (instances of Apache), you're going to want to configure each of them individually with wsconfig using the approach I described above.

          Q: Wanting to get to my real goal, how do I now TEST that my cluster (even though I can't see it in the CF Admin any more) will actually fail over to the remote instances I included in it? And how do I TEST load balancing?
          A: I would start by setting up a really basic "index.cfm" page, and deploy it to each CF instance in the cluster. In ColdFusion 7 at least (JRun4, jre 1.4.2), the following code will allow you to output the name of the individual cluster node that you are on.

          <cfset thisServer = createObject( "java", "jrunx.kernel.JRun" ).getServerName()>
          <cfoutput>Hello from #thisServer#!</cfoutput>

          I would wager that this code would still work on CF8. If not, I'm sure google can get you pointed in the right direction.

          This should allow you to try making different requests to your cluster from different web browsers, and witness which cluster node each request is directed to. If you're using "sticky sessions", in your cluster configuration, you should find that once you make a request to the cluster, your session will tend to "stick" to one CF instance. Otherwise, you should bounce around back and forth between cluster instances, seemingly at random.
          Also, feel free to poke around the logs found in .../JRun4/logs/, on each instance. That can also give you a good idea of what's going on under the hood, and what sort of activity each cluster node is getting.

          Once that's done, try issuing requests to your cluster while you bring individual instances up and down. That should give you some idea on how the cluster manages fail over scenarios.

          Q: Is there any difference in the way CF8 handles clustering and load balance compared to CF7 or even CF6.1?
          A: I don't know for sure, since I've never clustered a CF 6 or CF 8 environment myself. My general impression is that the technology and the approach hasn't changed much over the years though. I wish I could give you more information on that...but my lack of experience with those other versions prevents me from doing so. Perhaps someone else with CF 8 clustering experience can chime in on this matter.


          Best of luck!
          • 2. Re: Help me understand clustering and fail-over better
            DCwebGuy Level 1
            Wow, this is a great response and helps a lot. I am going to reinstall CF and tell it to use its own built-in server so that I can later point Apache to the cluster and not run into this "already configured for JRun" issue. Either that, or find a way to run multiple Apache instances, but I am relatively new to Apache and don't know how to do that yet (or if it's a good idea).

            One clarification. If the single box running my cluster goes down, won't that just stuff everything? Where is my fail-over safety net under that condition?

            I thought about this and wondered if I should create several clusters, one on each box (all pointing to unique remote instances). Will that work? I want a "lights out" approach to fail-over so that if any one of the 3 boxes go down the others will pick up. I'm talking BOXES, not just instances.

            Thanks.
            • 3. Re: Help me understand clustering and fail-over better
              Grizzly9279 Level 1
              That sounds like a great plan. To get you more comfortable with this install/setup process (so you can reliably replicate it), I would definitely recommend starting clean, doing fresh CF installs, and instructing the installer to ONLY use its own built-in web server, instead off auto-configuring Apache for you. This will produce cleaner results once your cluster is set up, and you use wsconfig to configure Apache to point to the cluster.

              Q: If the single box running my cluster goes down, won't that just stuff everything? Where is my fail-over safety net under that condition?
              A: If you have only one physical box hosting all of your CF instances in the cluster, than yes, you are losing something from a high availability perspective. If that one box were to run into a fatal/critical issue at the OS, firmware, or hardware level, your entire cluster would be impacted (obviously). If you have additional physical machines available to you, I would highly recommend distributing your CF instances across those additional machines, before "doubling-up" one machine.

              Our production environment has only 2 CF instances, each installed on their own entirely separate hardware. We've found is extreemly beneficial to have these instances straddled across separate machines, since we have the freedom to take one machine completely offline for maintenance, without negatively impacting the cluster.

              Another thing I thought I should mention is, what sort of memory do you have available to you on these servers? I believe JRun4 will default to a maxheap of 512MB. This means that each CF instance installed under that JRun4 installation is capable of consuming up to 512MB of RAM before JRun will put a stop on it. We actually have our JRun installs configured for a maxheap of 1024MB, since our CF instances tend to be fairly memory hungry. Our application handles quite a bit of unique users, each having fairly "heavy" sessions. Since our app servers only have 2 GB of RAM available, we're actually very limited on how many CF instances we can deploy to one machine.

              So, if you have some idea on how much memory each CF instance is going to require (at max), that will give you a good idea on how many instances you can squeeze onto one machine.

              To clarify, I don't think you want to set up "several clusters", but rather, one cluster....which is comprised of many CF instances that live on separate boxes. You can define and configure your cluster, and all of its associated remote instances on one machine. You can then point your Apache web server to that cluster (by name). Please keep in mind that each CF instance on your network needs to have its own unique name, in order for it to play nicely with the cluster. (cfusion1, cfusion2, cfusion3, etc...or however you would like to name them)

              I hope this helps. Feel free to come back with more questions if you run into any obstacles.
              • 4. Re: Help me understand clustering and fail-over better
                DCwebGuy Level 1
                Grizzly, thanks for your continued support. I think I'm doing something wrong, the cluster is not acting like I expect.

                I've done like you say and created a cluster named "box1" that is made up of 1 instance called "cfusion1" on box #1, one remote instance from box #2 called "cfusion2", and one remote instance from box #3 called "cfusion3".

                However, when I try to test fail-over by terminating "cfusion1" under the box1 cluster the site fails and I get a server error "Could not connect to JRun Server". What I expected was that box #2 or box #3 would have picked up.

                By the way, I inserted some test code in my index.cfm script on each server to display what instance it's coming from, and on box #1 that instance name never changed even though I have round-robin set up and NO sticky or session replication (just to make things simple for now). The code is <cfset thisServer = createObject( "java", "jrunx.kernel.JRun" ).getServerName()>.

                Now when I reconfigured Apache on box #1 to point to the cluster called "box1", it had the effect of moving my default web root from /var/www/ to something way down the directory chain here /opt/jrun4/servers/cfusion1/cfusion1.ear/cfusion1.war and so that's where my site is running (at least that's where it is publicly accessible now).

                Does that make sense? This is where I lost the Enterprise Manager button and had to go through port 8300 to see it again (on the local desktop).

                Let's not discuss RAM yet because I need to see that the cluster is failing over and balancing before I start tweaking ram. I have enough to put at least 2 instances on each box in case that helps somehow.

                I think the problem may be with Apache sending my web root to a new location under /opt/jrun/server like I mentioned above. I really expect I can still run my site under /var/www/ but the Apache isn't pointing there any more. What have I done wrong?

                My whole goal is to get CF failing-over and load balancing between physical boxes. Thanks again. I know I'll get through this. :)



                • 5. Re: Help me understand clustering and fail-over better
                  Grizzly9279 Level 1
                  At what point did you configure Apache to point to the "box1" cluster?

                  If you pointed Apache to your cluster before you added the remote instances (from box2 and box3), that could be your problem. I could be wrong, but I believe that I have had to uninstall/reinstall wsconfig on my web servers after adding new CF instances before.

                  Here are some examples from my personal scripts. (we're hosting on Windows 2003 these days...so this is IIS-centric):
                  • 6. Help me understand clustering and fail-over better
                    DCwebGuy Level 1
                    quote:

                    Originally posted by: Grizzly9279
                    Our production environment has only 2 CF instances, each installed on their own entirely separate hardware. We've found is extreemly beneficial to have these instances straddled across separate machines, since we have the freedom to take one machine completely offline for maintenance, without negatively impacting the cluster.

                    I hope this helps. Feel free to come back with more questions if you run into any obstacles.



                    Okay, I am getting closer. I have my cluster on box1 set up with two local instances, and two remote instances, so 4 instances total on box1. The Apache web server on both box1 and box2 is pointed to the "cluster". I have round-robin set up, yet when I go to the DNS for box1 and hit Refresh I see my coldfusion test page going back and forth between only the local instances. It does NOT bounce around between the local and remote instances. This appears to be a problem, as I expected to see every instance in the cluster get pinged when I hit refresh. When I go to the DNS for box2 and hit Refresh it bounces around between the two instances I created there, which makes perfect sense to me as box2 is not the "home" of the cluster.

                    So my question is: why is the cluster of 4 instances on box1 not routing my requests to the remote instances? Is it a web server config issue, or a CF issue...for example am I supposed to replicate the cluster on box2 using the same cluster name so both boxes have the exact same instances and clusters?

                    Second question, which is related to your quote above: you say you can take down ONE machine and not have it impact the cluster, but I imagine that one machine cannot be the machine where the cluster is configured, correct? If not, please explain how you get around this. That's why I thought I needed multiple clusters.

                    Thanks again for any help you can offer. One more step to go and I'll have it all figured out!
                    • 7. Re: Help me understand clustering and fail-over better
                      Grizzly9279 Level 1
                      It sounds like you're definitely getting close!

                      Now...let me first throw a little disclaimer out there before I give any further advice. My understanding on "cluster ownership" has always been a little hazy. I've developed an approach to defining and managing clusters which works for me. I have made some assumptions based on my experiences and observations over the years, but...I it isn't necessarily the best way. I've had no formal training in this, nor have I actually read books or documentation on the subject cover to cover. What I know, I only know through trial and error, and my observations and experiences over the years. So without further ado...

                      To confirm, you have 2 boxes (BOX1 and BOX2). BOX1 hosts 2 CF instances, let's call them "CF1" and "CF2". BOX2 hosts 2 CF instances, let's call them "CF3" and "CF4". (As I said before, it is important that each CF instance has a unique name on the network) Each box also hosts 1 instance of Apache web server. That being the case, here is exactly how I would approach it.

                      Log into the :8300 instance on BOX1. In there, you should at the very least see two local instances (CF1, and CF2). If not already configured, register the 2 other remote instances from BOX2 (CF3, and CF4). Once you have all 4 instances registered on BOX1, define a cluster...let's call it "myCluster". Assign all 4 instances to "myCluster" defined on BOX1.

                      Next, you're going to do the exact same thing on BOX2. Log into the :8300 instance on BOX2. In there, you should see two local instances (CF3, and CF4). If not already configured, register the 2 other remote instances from BOX1 (CF1, and CF2). Once you have all 4 instances registered on BOX2, define the cluster again using the same name as you did on BOX1 ("myCluster"). Assign all 4 instances to "myCluster" on BOX2.

                      At this point you should have both machines "owning" a cluster by the name "myCluster". All 4 instances, CF1, CF2, CF3, and CF4 are all members of "myCluster", and it does not matter if you're talking about "myCluster" defined on BOX1, or "myCluster" defined on BOX2. They're both essentially hosting the same cluster configuration.

                      Once that is done, it is now time to configure Apache. You're going to want to run through the wsconfig setup again, and as you probably already know, you have to point wsconfig to one specific host, and you can also specify a cluster name. Since both BOX1 and BOX2 both individually own their own cluster definition of "myCluster", it does not matter which host you point wsconfig to. Either BOX1, or BOX2 should work perfectly fine.

                      Now this is where my understanding gets a little hazy. Here is how I think it works. When you FIRST configure a web server with wsconfig, and you point it to a cluster...the wsconfig utility will query the target host for the cluster definition, and it will store that cluster definition locally on the web server itself. So if you point wsconfig to "myCluster" on BOX1....wsconfig will query BOX1, and pull back the cluster definition in its entirety. At this point, the web server KNOWS that it's pointed to "myCluster", which is comprised of instances CF1, CF2, CF3, and CF4. From this point onward, the web server does not care which machine actually hosts the cluster definition - since it maintains it's own local copy of it.

                      If I were you, I'd probably point the Apache instance on BOX1, to the cluster definition on BOX1. I would also point the Apache instance on BOX2 to the cluster instance on BOX1 (though it really shouldn't matter).

                      Even if both web servers were pointed to BOX1 to initially configure the JRun connector, you should be able to take BOX1 completely offline, and both web servers will happily utilize the 2 remaining CF instances on BOX2 (CF3, and CF4).

                      Once you're pretty sure you have everything set up, you should be able to hit either instance of Apache in a web browser, and see your requests flip-flop between all 4 instances of CF.

                      Again, it is very important that all 4 CF instances have unique names. I have run into a lot of weird and quirky issues before having two CF instances with the same name, on the same network (subnet).

                      Anyways...I hope this helps! Like I said, this may not be the best way, but it has worked pretty well for me in the past. Best of luck with it, and definitely come back to let us know how it goes.
                      • 8. Re: Help me understand clustering and fail-over better
                        StevenErat Level 1
                        This is a long thread, and I've only read the first entry so far. Here is my response to the first entry, and when I catch up to the end I'll post more....
                        -------------------------------



                        quote:


                        Box 1, let's call it, has my cluster, which contains 1) a single instance of itself, 2) a single remote instance from Box 2, and 3) a single remote instance from Box 3. I've made the assumption that having multiple boxes is where the real benefit from load balance and fail-over kicks in. E.g., if Box 1 goes down, Box 2 and 3 will pick up, and vice versa. Is that a correct assumption or understanding of how CF cluster fail-over and load balance work?



                        Defining a single cluster across three instances on three separate boxes introduces a single point of failure; In a robust setup, a webserver instance on a fourth box would be configured to connect to the ColdFusion cluster at the point of the first CF instance on box 1 (a.k.a. the bootstrap server). If that CF instance on box 1 fails (by crashing or queuing excessively) then the webserver conenctor on box 1 will begin to load balance traffic to the CF instances on box 2 and box 3. The load balancing configuration would functioning propertly at this point. However, a serious problem arises if box 1 goes down entirely (say a hardware failure), then that webserver connector on box 1 would not be able to route traffic to box 2 or box 3 since neither are directly connected to a webserver. Hence, CF instances on box 2 and 3 would not be able handle requests from the webserver on box 4 because the point of entry (the connector on box 1) is unavailable.

                        Fortunately, and additional layer of redundancy solves this problem (partly). To eliminate a single point of failure created by having only one cluster definitiion with one webserver connector, create 3 clusters each having 3 instances across the 3 boxes, where each cluster has cluster members on each box. To clarify, this would be how I would set this up:


                        Box 1) Three CF instances, called Box1-A, Box1-B, and Box1-C
                        Box 2) Also three CF instances, Box2-A, Box2-B, Box2-C
                        Box 3) Same... Box3-A, Box3-B, Box3-C
                        Box 4)A webserver having 3 virtual instances. Say, www1.company.com, www2.company.com, www3.company.com where each of those is roundrobined by a single www.company.com DNS entry.
                        Cluster A) Created on Box1 having cluster members Box1-A, Box2-A, Box3-A
                        (Bootstrap server is Box1-A)
                        Cluster B) Created on Box2 having cluster members Box1-B, Box2-B, Box3-B
                        (Bootstrap server is Box2-B)
                        Cluster C) Created on Box3 having cluster members Box1-C, Box2-C, Box3-C
                        (Bootstrap server is Box3-C)

                        In this configuration, each webserver instance on Box 4 is independently configured to connect to either Cluster A, Cluster B, or Cluster C respectively.

                        This is a much more redundant and reliable configuration than previous. Now, any of box1, box2, or box3 can go down entirely and traffic will be receieved by the remaining clusters and balanced across the boxes still up.

                        Unfortunately, there's still a flaw here since there is still a single point of failure, box4 having 3 instances of a webserver. Should box4 fail, then boxes 1, 2, and 3 are dead in the water.

                        The solution here is obvious, convert a single box having 3 webserver instances to 3 boxes (box 4, 5, and 6) having independent webservers each connected to one of the CF cluster definitions.

                        Perhaps the best resource for understanding redundant ColdFusion clustering is Brandon Purcell's articles on Adobe.com:

                        End to End Clustering with ColdFusion by Brandon Purcell

                        Another good resource is this blog entry which surveys a variety of articles on CF clustering:

                        Clustering Roundup by Brandon Harper





                        quote:

                        Now here's one of my actual questions: after all this it comes back saying "This web server is already configured for Jrun". 99% of the time I'm sure folks will have a web server already running when the install CF, so of course CF is going to configure it. Why then did the instructions tell me to do this, or rather NOT tell me to uninstall the original Apache config?



                        The wsconfig utility (Web Server Configuration utility) examines the file {CF_ROOT}/runtime/lib/wsconfig/wsconfig.properties to determine if any webservers are configured for already or not. (On full JRun with CF it will be {JRUN_ROOT}/lib/wsconfig/wsconfig.properties

                        For example, an entry like this would indicate one instance called coldfusion on localhost is configured at the global level (0) for IIS:
                        1=IIS,0,true,""
                        1.srv=localhost,"coldfusion"
                        1.cfmx=true,C:/CFusionMX7/wwwroot

                        Its possible that a previous attempt to configure the webserver may have writted an entry to this file, even if the configuration failed. If wsconfig tells you that an instance of CF is already configured for the webserver, then you should verify what's in this file as well as verify if there is any leftovers of a previous misconfiguration.

                        quote:


                        So at that point I ran config again, REMOVED the Apache web server, and re-added it using my cluster (followed the exact same instructions from above). This time it worked fine. But here's where it gets interesting.


                        Running REMOVE clears out the wsconfig.properties file and cleans up and leftovers from a previous misconfiguration.

                        quote:


                        When I went back to my CF Admin for this box the Enterprise Manager nav button/control panel for managing my instances and clusters was gone! Why?



                        I've not heard of this problem. I'd like to try to set this up to see if I can replicate it, but since I'm not on the ColdFusion support or engineering team anymore it may be difficult to get the time to do so.

                        quote:


                        Is there any difference in the way CF8 handles clustering and load balance compared to CF7 or even CF6.1?



                        Fundementally, no. CF7 introduced the Cluster Manger in the CF Admin (previously this was done through the JRun Admin followed by many manual steps). CF8 and CF7 have introduced bug fixes, but essentially it is the same over the last 5 years.

                        quote:

                        Since Adobe is discontinuing support for Jrun, what will future versions of CF use to accomplish clustering?


                        JRun server is an integral part of ColdFusion Server Config and ColdFusion Multiserver Config. You can expect support for it so long as you have support for the corresponding version(s) of ColdFusion that utilize JRun. You may have read references to discontinuation of JRun as an independently sold product that may have been confusing on this point.
                        • 9. Re: Help me understand clustering and fail-over better
                          DCwebGuy Level 1
                          Grizzly, I finally got it working! Between your experience, and some additional feedback from an Adobe engineer I emailed, I got over the technical hurdles and everything is working exactly the way I expect. Serat, your high-availability model below is excellent, and really helps guide me on how to take things to the next level.

                          The Adobe engineer gave me some tips on configuring the Jrun XML files manually to do what I needed. Here is what he said, and this worked perfectly for me. Again, I am using Amazon EC2 machines that are not on the same subnet, so this may be why I was running into some difficulty. By the way, the engineer refers the "Admin page of JRun" really all this means is the instance and cluster manager that is now combined in CF Enterprise (since version 7 I believe).

                          Say you have two jrun servers on two different machines(say A and B). First of all go to JRun4\lib and open security.properties file you can see a line jrun.subnet.restriction=255.255.255.0 change it to jrun.subnet.restriction=* on both the jrun servers.Now on the machine you want to configure the cluster (say A) first you need to register the remote server (i.e B) by clicking on “Register Remote Server” in Admin page of JRun and provide the necessary details .If you had not replaced with * in security.properties you will not get any status information of the remote server that is if its running or stopped. And on the remote server you will get an error “error Security alert:attempt to connect to JRun server from a A host” .

                          In jrun.xml(present in JRun4\servers\<server name>\SERVER-INF) of both the servers you want to include in the cluster under service “ClusterManager” add <attribute name=”unicastPeer”>ip address of the other machine</attribute> (i.e IP address of A in B’s server and viceversa).


                          I had never messed with the unicastPeer stuff, nor had I set the jrun.subnet.restriction to *. What I was doing was what the Adobe instructions said, which was to update the jrun.trusted.hosts line to include the IPs of my machines. Instead, after setting the jrun.subnet.restriction to * I didn't have to worry about trusted hosts...all the instances were picked up just fine, I didn't even have to reboot.

                          Finally, as you said Grizzly, I created identically named clusters on each box, each containing the remote instances of the other, pointed Apache2 to the localhost cluster, and now my test file bounces around between all the instances in the system, just as I was hoping.

                          Now, if anyone wants to help me take the next step, I need to figure out how DynDNS or some other sort of dynamic DNS service works so I can point my domain name to a bunch of DNS'es and have it be served among them. Right now, if I stop either of my two boxes then I need my domain to point to the "working" box. I can't use hardware. I need a software or web services solution to this.

                          Also, my IP addresses are NOT guaranteed static in the Amazon environment. Turns out I was able to use the Amazon DNS for both the cluster instances and the unicastPeer. I rebooted after making these changes and everything still worked properly. That is good news for anyone who wants to do this on Amazon EC2 or anywhere you are not guaranteed static IPs. I am not sure about how other "intelligent" load balancing programs will deal with DNS, but the round-robin built into CF is fine for me right now.

                          Again, thanks for sharing all your experience and time with me. It's made a huge difference, and reconfirms the CF community (in my mind anyway) as one of the best on the internet.
                          • 10. Re: Help me understand clustering and fail-over better
                            Atul Paralikar Level 1
                            @DCwebGuy & Grizzzly,

                            We are trying to setup the Cluster Load balancing in CF8 server with 4 instances on 2 different physical boxes. Below is our server configuration details:

                            Fedora 9 64bit, CF 8 64 bit Ent-trail edition.

                            Setup Details:
                            2 Linux boxes, 2 CF instances on each box. CF is setup as given by adobe help and in your post, in Multiserver environment. We created 2 instances of CF in both boxes called, CF1, CF2 in BOX1 and CF3, CF4 in Box 2.

                            Issue:
                            The issue we are having here is, when we add a remote server Box2 instance to Box1, it shows "Network Error". Where as when we check the same using JRun Admin console, it says "The server "cf3" is unavailable for administration on its specified host "10.10.0.14". Please make sure that the server exists, and that at least one server is running and registered with the JMC on the given machine."

                            We have done following things after going through some posts,

                            - jrun.subnet.restriction=*
                            - jrun.trusted.hosts=*
                            - We telnet'ed to the running instances port nubmers on other box2 from box1 and vice versa and we are successfully able to connect to each of them from either boxes.
                            - We have disabled firewall on both the servers.

                            Can you please guide us on the above. Let us know if
                            - we need to do anything else as this is a 64bit server?
                            - Is there any issue with 64bit edition of CF8 on Linux?
                            - Do we need to first join the CF to web server then configure the instances?

                            Thank you in advance.