3 Replies Latest reply: Aug 26, 2014 5:14 PM by carl type3 RSS

    How To: CF11 Clustering without Multicast (AWS)

    scottberry Community Member

      This week I've been working on getting clustering setup for a client. Initially we were using CF10 with the latest patches. Ideally we wanted non-sticky load balancing with session replication. We want really high availability with the option to reboot a server at any time and not have to wait for session draining or lose customers if a node goes down. Adam Cameron points out that there is an issue with CF10 and not having an option to turn on session replication Adam Cameron's CFML Blog: Problem with session replication with CF10 clustering. Trying various fixes I could not get the session to replicate we moved to CF11 which restores that issue. There is a bug open for CF10 with some weird responses but I never saw any sort of fix for this.

       

      CF11 as noted solves this odd issue, so I thought we were in the clear. Following the limited cluster setup guides found online there is some manual configuration to do on the remote instance. First, I am not sure if the default cfusion instance just can't be used as a member of a cluster but I had a hard time ever getting it to work. So both the local and remote instance use new CF11 instances created from within the Instance Manager. The instructions Adobe ColdFusion 10 * Enabling clustering for load balancing and failover are mostly correct in that you have to copy the <cluster> node to the remote instance. One issue pointed out in a few places is that the cluster block has to actually go IN the <host> node and not after it. CF10, CF11 and maybe even CF9 put the block (and the documents suggest putting the block) after the </host> tag which, in my experience, does not work.

       

      After everything was configured and I started up my test I could not get the remote node to respond at all. Looking in the cf error log I constantly saw this line:

       

      INFO: Manager [/]: skipping state transfer. No members active in cluster group.

       

      Digging in to the tomcat clustering discussions this basically means the cluster couldn't find the remote instance. By default CF uses the multicast cluster support in tomcat and doesn't have an option to do anything different. Researching this found that AWS does not support broadcast nor multicast in EC2. Further research showed how tomcat could be configured for static cluster member configuration and so I modified the server.xml files to match and viola, cluster with session replication. Using the ELB on AWS we have sticky sessions disabled (basically round-robin style requests) and the requests bounce evenly between the instance members. The session id's, however, stay the same on each page load even though the request is going to a different host.

       

      So here is what the cluster node of the server.xml looks like:

       

      <Cluster className="org.apache.catalina.ha.tcp.SimpleTcpCluster" channelSendOptions="8" channelStartOptions="3">
              <Manager notifyListenersOnReplication="true" expireSessionsOnShutdown="false" className="org.apache.catalina.ha.session.DeltaManager"/>
              <Channel className="org.apache.catalina.tribes.group.GroupChannel">
                <!--<Membership port="45564" dropTime="3000" address="228.0.0.4" className="org.apache.catalina.tribes.membership.McastService" frequency="500"/>-->
                <Receiver port="4001" autoBind="100" address="auto" selectorTimeout="5000" maxThreads="6" className="org.apache.catalina.tribes.transport.nio.NioReceiver"/>
                <Sender className="org.apache.catalina.tribes.transport.ReplicationTransmitter">
                  <Transport className="org.apache.catalina.tribes.transport.nio.PooledParallelSender"/>
                </Sender>
                <Interceptor className="org.apache.catalina.tribes.group.interceptors.TcpPingInterceptor"/> <!-- ADDED -->
                <Interceptor className="org.apache.catalina.tribes.group.interceptors.TcpFailureDetector"/>
                <Interceptor className="org.apache.catalina.tribes.group.interceptors.MessageDispatch15Interceptor"/>
            <Interceptor className="org.apache.catalina.tribes.group.interceptors.StaticMembershipInterceptor">
                      <Member className="org.apache.catalina.tribes.membership.StaticMember"
                        port="4002"
                        host="172.31.33.220"
                        domain="delta-static"
                        uniqueId="{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}"
                      />
                </Interceptor>
              </Channel>
              <Valve className="org.apache.catalina.ha.tcp.ReplicationValve" filter=""/>
              <Valve className="org.apache.catalina.ha.session.JvmRouteBinderValve"/>
              <ClusterListener className="org.apache.catalina.ha.session.JvmRouteSessionIDBinderListener"/>
              <ClusterListener className="org.apache.catalina.ha.session.ClusterSessionListener"/>
            </Cluster>
      

       

      You can see the <membership> node is commented out (this is the multicast function). The TcpPingInterceptor is added and the StaticMembershipInterceptor is added. The reciever port on this instance is 4001 and the remote instance is 4002 so the interceptor uses 4002 on this instance to contact the remote host and vice-versa. In other words the remote instance will use the same <cluster> node with the ports switch and the host IP address changed on the static interceptor. The uniqueID then rotates on each member going from {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15} to {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,0}

       

      Of course each additional member to the cluster will mean manual changes to each existing member (to add additional static interceptors) but that seems a small price to pay to not have to move our entire environment off AWS.

        • 1. Re: How To: CF11 Clustering without Multicast (AWS)
          carl type3 Community Member

          Hi Scott,

           

          Thanks for very informative post. No doubt will be helpful information for folks.

           

          Couple comments if I may?

           

          >default cfusion instance just can't be used as a member of a cluster but I had a hard time ever getting it to work. So both the local and remote instance use new CF11 instances created from within the Instance Manager.

           

          I think you can use the default instance cfusion tho I prefer not to, which is what you did, keeping the default apart from the clustered instances so I can manage the instances or cluster if the need arises.


          >AWS does not support broadcast nor multicast in EC2

           

          Very interesting. I wonder if it was not some kind of AWS EC2 security group denying the default CF multicast port traffic between CF instances (each CF instance on separate EC2 instances I am guessing).


          Regards, Carl.

          • 2. Re: How To: CF11 Clustering without Multicast (AWS)
            scottberry Community Member

            Thanks Carl. Maybe in my testing the default instance scenario never had the other proper configurations in place. Good to know.

             

            From the EC2 perspective Amazon has commented on their forums that they do not allow multi/broadcast traffic and while it has been a few years of them asking what it would be used for and soliciting opinions I haven't seen any movement on allowing it.

            • 3. Re: How To: CF11 Clustering without Multicast (AWS)
              carl type3 Community Member

              Hope I am not hijacking your excellent post.

               

              Some details to add for findings on AWS EC2 environment.

               

              From CMD prompt CF11 instance that is clusted starting:

               

              Aug 26, 2014 11:23:44 PM org.apache.catalina.ha.session.DeltaManager startIntern
              al
              INFO: Register manager / to cluster element Host with name localhost
              Aug 26, 2014 11:23:44 PM org.apache.catalina.ha.session.DeltaManager startIntern
              al
              INFO: Starting clustering manager at /
              Aug 26, 2014 11:23:44 PM org.apache.catalina.ha.session.DeltaManager getAllClust
              erSessions
              INFO: Manager [/], requesting session state from org.apache.catalina.tribes.memb
              ership.StaticMember[tcp://172.31.21.168:4001,172.31.21.168,4001, alive=0, secure
              Port=-1, UDP Port=-1, id={1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 }, payload={}, c
              ommand={}, domain={100 101 108 116 97 45 115 116 97 ...(12)}, ]. This operation
              will timeout if no session state has been received within 60 seconds.
              Aug 26, 2014 11:23:45 PM org.apache.catalina.ha.session.DeltaManager waitForSend
              AllSessions
              INFO: Manager [/]; session state send at 8/26/14 11:23 PM received in 125 ms.
              Aug 26, 2014 11:23:45 PM org.apache.catalina.ha.session.JvmRouteBinderValve star
              tInternal
              INFO: JvmRouteBinderValve started

               

               

              From CMD prompt CF11 instance details when other cluster has been restarted:

               

              Aug 26, 2014 11:22:47 PM org.apache.catalina.ha.tcp.SimpleTcpCluster memberDisap
              peared
              INFO: Received member disappeared:org.apache.catalina.tribes.membership.StaticMe
              mber[tcp://172.31.25.175:4002,172.31.25.175,4002, alive=0, securePort=-1, UDP Po
              rt=-1, id={0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 }, payload={}, command={}, doma
              in={100 101 108 116 97 45 115 116 97 ...(12)}, ]
              Aug 26, 2014 11:23:06 PM org.apache.catalina.ha.tcp.SimpleTcpCluster memberAdded

              INFO: Replication member added:org.apache.catalina.tribes.membership.StaticMembe
              r[tcp://172.31.25.175:4002,172.31.25.175,4002, alive=0, securePort=-1, UDP Port=
              -1, id={0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 }, payload={}, command={}, domain=
              {100 101 108 116 97 45 115 116 97 ...(12)}, ]
              Aug 26, 2014 11:23:06 PM org.apache.catalina.tribes.group.interceptors.TcpFailur
              eDetector performBasicCheck
              INFO: Suspect member, confirmed alive.[org.apache.catalina.tribes.membership.Sta
              ticMember[tcp://172.31.25.175:4002,172.31.25.175,4002, alive=0, securePort=-1, U
              DP Port=-1, id={0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 }, payload={}, command={},
              domain={100 101 108 116 97 45 115 116 97 ...(12)}, ]]

               

               


              Running CF11 via services.msc (as you normally would) these similar details are recorded in ColdFusion11\clustered_instance\logs\coldfusion-error.log. The latter part of log showing when other clustered instance has been stopped and started.

               

              Aug 26, 2014 11:40:31 PM org.apache.catalina.ha.session.DeltaManager startInternal
              INFO: Register manager / to cluster element Host with name localhost
              Aug 26, 2014 11:40:31 PM org.apache.catalina.ha.session.DeltaManager startInternal
              INFO: Starting clustering manager at /
              Aug 26, 2014 11:40:31 PM org.apache.catalina.ha.session.DeltaManager getAllClusterSessions
              INFO: Manager [/], requesting session state from org.apache.catalina.tribes.membership.StaticMember[tcp://172.31.21.168:4001,172.31.21.168 ,4001, alive=0, securePort=-1, UDP Port=-1, id={1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 }, payload={}, command={}, domain={100 101 108 116 97 45 115 116 97 ...(12)}, ]. This operation will timeout if no session state has been received within 60 seconds.
              Aug 26, 2014 11:40:31 PM org.apache.catalina.ha.session.DeltaManager waitForSendAllSessions
              INFO: Manager [/]; session state send at 8/26/14 11:40 PM received in 141 ms.
              Aug 26, 2014 11:40:31 PM org.apache.catalina.ha.session.JvmRouteBinderValve startInternal
              INFO: JvmRouteBinderValve started
              Aug 26, 2014 11:40:31 PM org.apache.coyote.AbstractProtocol start
              INFO: Starting ProtocolHandler ["http-bio-8501"]
              Aug 26, 2014 11:40:31 PM org.apache.coyote.AbstractProtocol start
              INFO: Starting ProtocolHandler ["ajp-bio-8012"]
              Aug 26, 2014 11:40:31 PM com.adobe.coldfusion.launcher.Launcher run
              INFO: Server startup in 44274 ms
              Aug 26, 2014 11:42:04 PM org.apache.catalina.ha.tcp.SimpleTcpCluster memberDisappeared
              INFO: Received member disappeared:org.apache.catalina.tribes.membership.StaticMember[tcp://172.31.21.168:4001,1 72.31.21.168,4001, alive=0, securePort=-1, UDP Port=-1, id={1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 }, payload={}, command={}, domain={100 101 108 116 97 45 115 116 97 ...(12)}, ]
              Aug 26, 2014 11:42:23 PM org.apache.catalina.ha.tcp.SimpleTcpCluster memberAdded
              INFO: Replication member added:org.apache.catalina.tribes.membership.StaticMember[tcp://172.31.21.168:4001,172.31. 21.168,4001, alive=0, securePort=-1, UDP Port=-1, id={1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 }, payload={}, command={}, domain={100 101 108 116 97 45 115 116 97 ...(12)}, ]
              Aug 26, 2014 11:42:23 PM org.apache.catalina.tribes.group.interceptors.TcpFailureDetector performBasicCheck
              INFO: Suspect member, confirmed alive.[org.apache.catalina.tribes.membership.StaticMember[tcp://172.31.21.168:4001,172.31 .21.168,4001, alive=0, securePort=-1, UDP Port=-1, id={1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 }, payload={}, command={}, domain={100 101 108 116 97 45 115 116 97 ...(12)}, ]]

               

               

              Hope that adds to the usefulness of this thread.

               

              Regards, Carl.