4 Replies Latest reply on Nov 26, 2014 5:47 AM by GuitsBoy

    Cluster Replication issues in CF10 on RHEL6

    GuitsBoy Level 1

      We recently updated our four physical servers to CF10 update 14, and gave them all a reboot to let the latest Kernel and RHEL6 security updates take effect.   Each server has two worker instances, cfusion1 and cfusion2, in a round robin cluster.

       

      The servers came back fine after the reboot until the following day.  Upon restarting any CF instance, it seems the instance locks up.   If I bring both CF instances down, then both back up, the cluster *USUALLY* comes back fine, although sometimes it does not, and I have to reboot the box.  To make matteres even weirder, sometimes I can restart an instance, regardless if both are up or down, if I remove the secondary IP address on em1:1.  Weird.  This issue exists across all four physical servers.   I have pulled one out of our web cluster to try to troubleshoot, while the other three limp along.

       

      The major issue is that when one of the instances hang, they do so in a zombie state, where they are half dead, but not dead enough for the tomcate cluster to expire the instance.  That means half my requests are processed by the working instance, and the other half my requests queue up indefinitely, eventually bringing my webserver down completely.  While it seems that shutting down both instances then bringing both up again usually works, its not something I like to do on production machines.  And occasionally, the instance wont come back.  These machines have become painfully unstable.

       

       

      When I attempt to restart cfusion2, heres what I see in the coldfusion-error.log

       

      Nov 24, 2014 6:02:58 PM org.apache.catalina.core.AprLifecycleListener init

      INFO: The APR based Apache Tomcat Native library which allows optimal performance in production environments was not found on the java.library.path: /opt/coldfusion10/jre/lib/amd64/server:/opt/coldfusion10/jre/lib/amd64:/opt/coldfusion10/ jre/../lib/amd64:/opt/coldfusion10/cfusion2/lib:/opt/coldfusion10/cfusion2/lib/_ilnx21/bin :/opt/coldfusion10/cfusion2/lib/international::/usr/java/packages/lib/amd64:/usr/lib64:/li b64:/lib:/usr/lib

      Nov 24, 2014 6:02:59 PM org.apache.coyote.AbstractProtocol init

      INFO: Initializing ProtocolHandler ["http-bio-8502"]

      Nov 24, 2014 6:02:59 PM org.apache.coyote.AbstractProtocol init

      INFO: Initializing ProtocolHandler ["http-bio-8447"]

      Nov 24, 2014 6:03:00 PM org.apache.coyote.AbstractProtocol init

      INFO: Initializing ProtocolHandler ["ajp-bio-8014"]

      Nov 24, 2014 6:03:00 PM org.apache.catalina.core.StandardService startInternal

      INFO: Starting service Catalina

      Nov 24, 2014 6:03:00 PM org.apache.catalina.core.StandardEngine startInternal

      INFO: Starting Servlet Engine: Apache Tomcat/7.0.54

      Nov 24, 2014 6:03:00 PM org.apache.catalina.ha.tcp.SimpleTcpCluster startInternal

      INFO: Cluster is about to start

      Nov 24, 2014 6:03:00 PM org.apache.catalina.tribes.transport.ReceiverBase bind

      INFO: Receiver Server Socket bound to:/10.10.240.104:4002

      Nov 24, 2014 6:03:00 PM org.apache.catalina.tribes.membership.McastServiceImpl setupSocket

      INFO: Setting cluster mcast soTimeout to 500

      Nov 24, 2014 6:03:00 PM org.apache.catalina.tribes.membership.McastServiceImpl waitForMembers

      INFO: Sleeping for 1000 milliseconds to establish cluster membership, start level:4

      Nov 24, 2014 6:03:00 PM org.apache.catalina.ha.tcp.SimpleTcpCluster memberAdded

      INFO: Replication member added:org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 104}:4001,{10, 10, 240, 104},4001, alive=416205, securePort=-1, UDP Port=-1, id={-40 103 -88 33 -118 2 70 76 -125 -43 102 49 -86 -103 123 -42 }, payload={}, command={}, domain={}, ]

      Nov 24, 2014 6:03:01 PM org.apache.catalina.tribes.membership.McastServiceImpl waitForMembers

      INFO: Done sleeping, membership established, start level:4

      Nov 24, 2014 6:03:01 PM org.apache.catalina.tribes.membership.McastServiceImpl waitForMembers

      INFO: Sleeping for 1000 milliseconds to establish cluster membership, start level:8

      Nov 24, 2014 6:03:01 PM org.apache.catalina.tribes.io.BufferPool getBufferPool

      INFO: Created a buffer pool with max size:104857600 bytes of type:org.apache.catalina.tribes.io.BufferPool15Impl

      Nov 24, 2014 6:03:02 PM org.apache.catalina.tribes.membership.McastServiceImpl waitForMembers

      INFO: Done sleeping, membership established, start level:8

      Nov 24, 2014 6:03:04 PM org.apache.catalina.ha.session.ClusterSessionListener messageReceived

      WARNING: Context manager doesn't exist:localhost#/

      Nov 24, 2014 6:03:05 PM org.apache.catalina.ha.session.ClusterSessionListener messageReceived

      WARNING: Context manager doesn't exist:localhost#/

      Nov 24, 2014 6:03:06 PM org.apache.catalina.ha.session.DeltaManager startInternal

      INFO: Register manager localhost#/ to cluster element Engine with name Catalina

      Nov 24, 2014 6:03:06 PM org.apache.catalina.ha.session.DeltaManager startInternal

      INFO: Starting clustering manager at localhost#/

      Nov 24, 2014 6:03:36 PM org.apache.catalina.tribes.group.interceptors.TcpFailureDetector memberDisappeared

      INFO: Received memberDisappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 104}:4001,{10, 10, 240, 104},4001, alive=451725, securePort=-1, UDP Port=-1, id={-40 103 -88 33 -118 2 70 76 -125 -43 102 49 -86 -103 123 -42 }, payload={}, command={}, domain={}, ]] message. Will verify.

      Nov 24, 2014 6:03:36 PM org.apache.catalina.tribes.group.interceptors.TcpFailureDetector memberDisappeared

      INFO: Verification complete. Member still alive[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 104}:4001,{10, 10, 240, 104},4001, alive=451725, securePort=-1, UDP Port=-1, id={-40 103 -88 33 -118 2 70 76 -125 -43 102 49 -86 -103 123 -42 }, payload={}, command={}, domain={}, ]]

      Nov 24, 2014 6:03:36 PM org.apache.catalina.ha.tcp.SimpleTcpCluster send

      SEVERE: Unable to send message through cluster sender.

      org.apache.catalina.tribes.ChannelException: Operation has timed out(30000 ms.).; Faulty members:tcp://{10, 10, 240, 104}:4001;

              at org.apache.catalina.tribes.transport.nio.ParallelNioSender.sendMessage(ParallelNioSender. java:109)

              at org.apache.catalina.tribes.transport.nio.PooledParallelSender.sendMessage(PooledParallelS ender.java:54)

              at org.apache.catalina.tribes.transport.ReplicationTransmitter.sendMessage(ReplicationTransm itter.java:78)

              at org.apache.catalina.tribes.group.ChannelCoordinator.sendMessage(ChannelCoordinator.java:7 7)

              at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBas e.java:77)

              at org.apache.catalina.tribes.group.interceptors.MessageDispatchInterceptor.sendMessage(Mess ageDispatchInterceptor.java:77)

              at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBas e.java:77)

              at org.apache.catalina.tribes.group.interceptors.TcpFailureDetector.sendMessage(TcpFailureDe tector.java:93)

              at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBas e.java:77)

              at org.apache.catalina.tribes.group.GroupChannel.send(GroupChannel.java:224)

              at org.apache.catalina.tribes.group.GroupChannel.send(GroupChannel.java:182)

              at org.apache.catalina.ha.tcp.SimpleTcpCluster.send(SimpleTcpCluster.java:837)

              at org.apache.catalina.ha.session.DeltaManager.getAllClusterSessions(DeltaManager.java:789)

              at org.apache.catalina.ha.session.DeltaManager.startInternal(DeltaManager.java:756)

              at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)

              at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5476)

              at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)

              at org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1559)

              at org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1549)

              at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)

              at java.util.concurrent.FutureTask.run(FutureTask.java:138)

              at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

              at java.lang.Thread.run(Thread.java:662)

      Nov 24, 2014 6:03:36 PM org.apache.catalina.ha.session.DeltaManager getAllClusterSessions

      INFO: Manager [localhost#/], requesting session state from org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 104}:4001,{10, 10, 240, 104},4001, alive=451725, securePort=-1, UDP Port=-1, id={-40 103 -88 33 -118 2 70 76 -125 -43 102 49 -86 -103 123 -42 }, payload={}, command={}, domain={}, ]. This operation will timeout if no session state has been received within 60 seconds.

      Nov 24, 2014 6:03:36 PM org.apache.catalina.ha.session.DeltaManager waitForSendAllSessions

      INFO: Manager [localhost#/]; session state send at 11/24/14 6:03 PM received in 30,264 ms.

      Nov 24, 2014 6:03:36 PM org.apache.catalina.session.StandardSession tellNew

      SEVERE: Session event listener threw exception

      java.lang.NullPointerException

              at coldfusion.bootstrap.HttpFlexSessionBootstrap.getListener(HttpFlexSessionBootstrap.java:1 54)

              at coldfusion.bootstrap.HttpFlexSessionBootstrap.sessionCreated(HttpFlexSessionBootstrap.jav a:69)

              at org.apache.catalina.session.StandardSession.tellNew(StandardSession.java:422)

              at org.apache.catalina.session.StandardSession.setId(StandardSession.java:394)

              at org.apache.catalina.ha.session.DeltaSession.setId(DeltaSession.java:275)

              at org.apache.catalina.ha.session.DeltaManager.handleSESSION_CREATED(DeltaManager.java:1336)

              at org.apache.catalina.ha.session.DeltaManager.messageReceived(DeltaManager.java:1214)

              at org.apache.catalina.ha.session.DeltaManager.getAllClusterSessions(DeltaManager.java:803)

              at org.apache.catalina.ha.session.DeltaManager.startInternal(DeltaManager.java:756)

              at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)

              at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5476)

              at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)

              at org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1559)

              at org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1549)

              at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)

              at java.util.concurrent.FutureTask.run(FutureTask.java:138)

              at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

              at java.lang.Thread.run(Thread.java:662)

       

      Then when I try to stop the instance, I get this indication of a half dead process:

       

      #/opt/coldfusion10/cfusion2/bin/coldfusion stop

      Stopping ColdFusion 10 server instance named cfusion2, please wait

      Nov 24, 2014 6:06:03 PM com.adobe.coldfusion.launcher.Launcher stopServer

      SEVERE: Shutdown Port 8009is not active. Stop the server only after it is started.

      ColdFusion 10 server instance named cfusion2 has been stopped

       

       

      The working cluster instance cfusion1 shows this in the coldfusion-error.log

       

      Nov 24, 2014 6:06:17 PM org.apache.catalina.tribes.transport.nio.ParallelNioSender doLoop

      WARNING: Not retrying send for:tcp://{10, 10, 240, 104}:4002; Sender is disconnected.

      Nov 24, 2014 6:06:17 PM org.apache.catalina.tribes.group.interceptors.TcpFailureDetector memberDisappeared

      INFO: Received memberDisappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 104}:4002,{10, 10, 240, 104},4002, alive=192591, securePort=-1, UDP Port=-1, id={-96 126 -95 87 -79 5 76 -125 -101 -68 -56 -60 90 -22 -1 7 }, payload={}, command={}, domain={}, ]] message. Will verify.

      Nov 24, 2014 6:06:17 PM org.apache.catalina.tribes.group.interceptors.TcpFailureDetector memberDisappeared

      INFO: Verification complete. Member already disappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 104}:4002,{10, 10, 240, 104},4002, alive=192591, securePort=-1, UDP Port=-1, id={-96 126 -95 87 -79 5 76 -125 -101 -68 -56 -60 90 -22 -1 7 }, payload={}, command={}, domain={}, ]]

      Nov 24, 2014 6:06:17 PM org.apache.catalina.ha.tcp.SimpleTcpCluster send

      SEVERE: Unable to send message through cluster sender.

      org.apache.catalina.tribes.ChannelException: Send failed, and sender is disconnected. Not retrying.; Faulty members:tcp://{10, 10, 240, 104}:4002;

              at org.apache.catalina.tribes.transport.nio.ParallelNioSender.doLoop(ParallelNioSender.java: 171)

              at org.apache.catalina.tribes.transport.nio.ParallelNioSender.sendMessage(ParallelNioSender. java:89)

              at org.apache.catalina.tribes.transport.nio.PooledParallelSender.sendMessage(PooledParallelS ender.java:54)

              at org.apache.catalina.tribes.transport.ReplicationTransmitter.sendMessage(ReplicationTransm itter.java:78)

              at org.apache.catalina.tribes.group.ChannelCoordinator.sendMessage(ChannelCoordinator.java:7 7)

              at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBas e.java:77)

              at org.apache.catalina.tribes.group.interceptors.MessageDispatchInterceptor.sendMessage(Mess ageDispatchInterceptor.java:77)

              at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBas e.java:77)

              at org.apache.catalina.tribes.group.interceptors.TcpFailureDetector.sendMessage(TcpFailureDe tector.java:93)

              at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBas e.java:77)

              at org.apache.catalina.tribes.group.GroupChannel.send(GroupChannel.java:224)

              at org.apache.catalina.tribes.group.GroupChannel.send(GroupChannel.java:182)

              at org.apache.catalina.ha.tcp.SimpleTcpCluster.send(SimpleTcpCluster.java:843)

              at org.apache.catalina.ha.tcp.SimpleTcpCluster.send(SimpleTcpCluster.java:815)

              at org.apache.catalina.ha.session.DeltaManager.send(DeltaManager.java:497)

              at org.apache.catalina.ha.session.DeltaManager.sendCreateSession(DeltaManager.java:487)

              at org.apache.catalina.ha.session.DeltaManager.createSession(DeltaManager.java:463)

              at org.apache.catalina.ha.session.DeltaManager.createSession(DeltaManager.java:450)

              at org.apache.catalina.connector.Request.doGetSession(Request.java:2947)

              at org.apache.catalina.connector.Request.getSession(Request.java:2311)

              at org.apache.catalina.connector.RequestFacade.getSession(RequestFacade.java:897)

              at coldfusion.runtime.AppHelper.setupJ2eeSessionScope(AppHelper.java:974)

              at coldfusion.runtime.AppHelper.setupSessionScope(AppHelper.java:1067)

              at coldfusion.filter.ApplicationFilter.invoke(ApplicationFilter.java:361)

              at coldfusion.filter.RequestMonitorFilter.invoke(RequestMonitorFilter.java:48)

              at coldfusion.filter.MonitoringFilter.invoke(MonitoringFilter.java:40)

              at coldfusion.filter.PathFilter.invoke(PathFilter.java:112)

              at coldfusion.filter.ExceptionFilter.invoke(ExceptionFilter.java:94)

              at coldfusion.filter.BrowserDebugFilter.invoke(BrowserDebugFilter.java:79)

              at coldfusion.filter.ClientScopePersistenceFilter.invoke(ClientScopePersistenceFilter.java:2 8)

              at coldfusion.filter.BrowserFilter.invoke(BrowserFilter.java:38)

              at coldfusion.filter.NoCacheFilter.invoke(NoCacheFilter.java:58)

              at coldfusion.filter.GlobalsFilter.invoke(GlobalsFilter.java:38)

              at coldfusion.filter.DatasourceFilter.invoke(DatasourceFilter.java:22)

              at coldfusion.filter.CachingFilter.invoke(CachingFilter.java:62)

              at coldfusion.CfmServlet.service(CfmServlet.java:219)

              at coldfusion.bootstrap.BootstrapServlet.service(BootstrapServlet.java:89)

              at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.j ava:303)

              at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)

              at coldfusion.monitor.event.MonitoringServletFilter.doFilter(MonitoringServletFilter.java:42 )

              at coldfusion.bootstrap.BootstrapFilter.doFilter(BootstrapFilter.java:46)

              at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.j ava:241)

              at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)

              at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)

              at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)

              at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:501)

              at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)

              at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)

              at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950)

              at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)

              at org.apache.catalina.ha.session.JvmRouteBinderValve.invoke(JvmRouteBinderValve.java:218)

              at org.apache.catalina.ha.tcp.ReplicationValve.invoke(ReplicationValve.java:333)

              at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:422)

              at org.apache.coyote.ajp.AjpProcessor.process(AjpProcessor.java:199)

              at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.jav a:607)

              at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:314)

              at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

              at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)

              at java.lang.Thread.run(Thread.java:662)

      Caused by: java.io.IOException: Connection reset by peer

              at sun.nio.ch.FileDispatcher.read0(Native Method)

              at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)

              at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)

              at sun.nio.ch.IOUtil.read(IOUtil.java:171)

              at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)

              at org.apache.catalina.tribes.transport.nio.NioSender.read(NioSender.java:169)

              at org.apache.catalina.tribes.transport.nio.NioSender.process(NioSender.java:119)

              at org.apache.catalina.tribes.transport.nio.ParallelNioSender.doLoop(ParallelNioSender.java: 142)

              ... 59 more

      Nov 24, 2014 6:06:18 PM org.apache.catalina.tribes.transport.nio.ParallelNioSender doLoop

      WARNING: Not retrying send for:tcp://{10, 10, 240, 104}:4002; Sender is disconnected.

      Nov 24, 2014 6:06:18 PM org.apache.catalina.tribes.group.interceptors.TcpFailureDetector memberDisappeared

      INFO: Received memberDisappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 104}:4002,{10, 10, 240, 104},4002, alive=192591, securePort=-1, UDP Port=-1, id={-96 126 -95 87 -79 5 76 -125 -101 -68 -56 -60 90 -22 -1 7 }, payload={}, command={}, domain={}, ]] message. Will verify.

      Nov 24, 2014 6:06:18 PM org.apache.catalina.tribes.group.interceptors.TcpFailureDetector memberDisappeared

      INFO: Verification complete. Member already disappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 104}:4002,{10, 10, 240, 104},4002, alive=192591, securePort=-1, UDP Port=-1, id={-96 126 -95 87 -79 5 76 -125 -101 -68 -56 -60 90 -22 -1 7 }, payload={}, command={}, domain={}, ]]

      Nov 24, 2014 6:06:18 PM org.apache.catalina.ha.tcp.SimpleTcpCluster send

      SEVERE: Unable to send message through cluster sender.

      org.apache.catalina.tribes.ChannelException: Send failed, and sender is disconnected. Not retrying.; Faulty members:tcp://{10, 10, 240, 104}:4002;

              at org.apache.catalina.tribes.transport.nio.ParallelNioSender.doLoop(ParallelNioSender.java: 171)

              at org.apache.catalina.tribes.transport.nio.ParallelNioSender.sendMessage(ParallelNioSender. java:89)

              at org.apache.catalina.tribes.transport.nio.PooledParallelSender.sendMessage(PooledParallelS ender.java:54)

              at org.apache.catalina.tribes.transport.ReplicationTransmitter.sendMessage(ReplicationTransm itter.java:78)

              at org.apache.catalina.tribes.group.ChannelCoordinator.sendMessage(ChannelCoordinator.java:7 7)

              at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBas e.java:77)

              at org.apache.catalina.tribes.group.interceptors.MessageDispatchInterceptor.sendMessage(Mess ageDispatchInterceptor.java:77)

              at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBas e.java:77)

              at org.apache.catalina.tribes.group.interceptors.TcpFailureDetector.sendMessage(TcpFailureDe tector.java:93)

              at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBas e.java:77)

              at org.apache.catalina.tribes.group.GroupChannel.send(GroupChannel.java:224)

              at org.apache.catalina.tribes.group.GroupChannel.send(GroupChannel.java:182)

              at org.apache.catalina.ha.tcp.SimpleTcpCluster.send(SimpleTcpCluster.java:843)

              at org.apache.catalina.ha.tcp.SimpleTcpCluster.send(SimpleTcpCluster.java:815)

              at org.apache.catalina.ha.tcp.ReplicationValve.send(ReplicationValve.java:539)

              at org.apache.catalina.ha.tcp.ReplicationValve.sendMessage(ReplicationValve.java:524)

              at org.apache.catalina.ha.tcp.ReplicationValve.sendSessionReplicationMessage(ReplicationValv e.java:506)

              at org.apache.catalina.ha.tcp.ReplicationValve.sendReplicationMessage(ReplicationValve.java: 419)

              at org.apache.catalina.ha.tcp.ReplicationValve.invoke(ReplicationValve.java:343)

              at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:422)

              at org.apache.coyote.ajp.AjpProcessor.process(AjpProcessor.java:199)

              at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.jav a:607)

              at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:314)

              at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

              at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)

              at java.lang.Thread.run(Thread.java:662)

      Caused by: java.io.IOException: Connection reset by peer

              at sun.nio.ch.FileDispatcher.read0(Native Method)

              at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)

              at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)

              at sun.nio.ch.IOUtil.read(IOUtil.java:171)

              at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)

              at org.apache.catalina.tribes.transport.nio.NioSender.read(NioSender.java:169)

              at org.apache.catalina.tribes.transport.nio.NioSender.process(NioSender.java:119)

              at org.apache.catalina.tribes.transport.nio.ParallelNioSender.doLoop(ParallelNioSender.java: 142)

              ... 26 more

      Nov 24, 2014 6:06:19 PM org.apache.catalina.tribes.transport.nio.ParallelNioSender doLoop

      WARNING: Not retrying send for:tcp://{10, 10, 240, 104}:4002; Sender is disconnected.

      Nov 24, 2014 6:06:19 PM org.apache.catalina.tribes.group.interceptors.TcpFailureDetector memberDisappeared

      INFO: Received memberDisappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 104}:4002,{10, 10, 240, 104},4002, alive=192591, securePort=-1, UDP Port=-1, id={-96 126 -95 87 -79 5 76 -125 -101 -68 -56 -60 90 -22 -1 7 }, payload={}, command={}, domain={}, ]] message. Will verify.

      Nov 24, 2014 6:06:19 PM org.apache.catalina.tribes.group.interceptors.TcpFailureDetector memberDisappeared

      INFO: Verification complete. Member already disappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 104}:4002,{10, 10, 240, 104},4002, alive=192591, securePort=-1, UDP Port=-1, id={-96 126 -95 87 -79 5 76 -125 -101 -68 -56 -60 90 -22 -1 7 }, payload={}, command={}, domain={}, ]]

      Nov 24, 2014 6:06:19 PM org.apache.catalina.ha.tcp.SimpleTcpCluster send

      SEVERE: Unable to send message through cluster sender.

      org.apache.catalina.tribes.ChannelException: Send failed, and sender is disconnected. Not retrying.; Faulty members:tcp://{10, 10, 240, 104}:4002;

              at org.apache.catalina.tribes.transport.nio.ParallelNioSender.doLoop(ParallelNioSender.java: 171)

              at org.apache.catalina.tribes.transport.nio.ParallelNioSender.sendMessage(ParallelNioSender. java:89)

              at org.apache.catalina.tribes.transport.nio.PooledParallelSender.sendMessage(PooledParallelS ender.java:54)

              at org.apache.catalina.tribes.transport.ReplicationTransmitter.sendMessage(ReplicationTransm itter.java:78)

              at org.apache.catalina.tribes.group.ChannelCoordinator.sendMessage(ChannelCoordinator.java:7 7)

              at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBas e.java:77)

              at org.apache.catalina.tribes.group.interceptors.MessageDispatchInterceptor.sendMessage(Mess ageDispatchInterceptor.java:77)

              at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBas e.java:77)

              at org.apache.catalina.tribes.group.interceptors.TcpFailureDetector.sendMessage(TcpFailureDe tector.java:93)

              at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBas e.java:77)

              at org.apache.catalina.tribes.group.GroupChannel.send(GroupChannel.java:224)

              at org.apache.catalina.tribes.group.GroupChannel.send(GroupChannel.java:182)

              at org.apache.catalina.ha.tcp.SimpleTcpCluster.send(SimpleTcpCluster.java:843)

              at org.apache.catalina.ha.tcp.SimpleTcpCluster.send(SimpleTcpCluster.java:815)

              at org.apache.catalina.ha.tcp.ReplicationValve.send(ReplicationValve.java:539)

              at org.apache.catalina.ha.tcp.ReplicationValve.sendMessage(ReplicationValve.java:524)

              at org.apache.catalina.ha.tcp.ReplicationValve.sendSessionReplicationMessage(ReplicationValv e.java:506)

              at org.apache.catalina.ha.tcp.ReplicationValve.sendReplicationMessage(ReplicationValve.java: 419)

              at org.apache.catalina.ha.tcp.ReplicationValve.invoke(ReplicationValve.java:343)

              at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:422)

              at org.apache.coyote.ajp.AjpProcessor.process(AjpProcessor.java:199)

              at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.jav a:607)

              at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:314)

              at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

              at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)

              at java.lang.Thread.run(Thread.java:662)

      Caused by: java.io.IOException: Connection reset by peer

              at sun.nio.ch.FileDispatcher.read0(Native Method)

              at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)

              at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)

              at sun.nio.ch.IOUtil.read(IOUtil.java:171)

              at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)

              at org.apache.catalina.tribes.transport.nio.NioSender.read(NioSender.java:169)

              at org.apache.catalina.tribes.transport.nio.NioSender.process(NioSender.java:119)

              at org.apache.catalina.tribes.transport.nio.ParallelNioSender.doLoop(ParallelNioSender.java: 142)

              ... 26 more

      Nov 24, 2014 6:06:19 PM org.apache.catalina.tribes.transport.nio.ParallelNioSender doLoop

      WARNING: Not retrying send for:tcp://{10, 10, 240, 104}:4002; Sender is disconnected.

      Nov 24, 2014 6:06:19 PM org.apache.catalina.tribes.group.interceptors.TcpFailureDetector memberDisappeared

      INFO: Received memberDisappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 104}:4002,{10, 10, 240, 104},4002, alive=192591, securePort=-1, UDP Port=-1, id={-96 126 -95 87 -79 5 76 -125 -101 -68 -56 -60 90 -22 -1 7 }, payload={}, command={}, domain={}, ]] message. Will verify.

      Nov 24, 2014 6:06:19 PM org.apache.catalina.tribes.group.interceptors.TcpFailureDetector memberDisappeared

      INFO: Verification complete. Member already disappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 104}:4002,{10, 10, 240, 104},4002, alive=192591, securePort=-1, UDP Port=-1, id={-96 126 -95 87 -79 5 76 -125 -101 -68 -56 -60 90 -22 -1 7 }, payload={}, command={}, domain={}, ]]

      Nov 24, 2014 6:06:19 PM org.apache.catalina.ha.tcp.SimpleTcpCluster send

      SEVERE: Unable to send message through cluster sender.

      org.apache.catalina.tribes.ChannelException: Send failed, and sender is disconnected. Not retrying.; Faulty members:tcp://{10, 10, 240, 104}:4002;

              at org.apache.catalina.tribes.transport.nio.ParallelNioSender.doLoop(ParallelNioSender.java: 171)

              at org.apache.catalina.tribes.transport.nio.ParallelNioSender.sendMessage(ParallelNioSender. java:89)

              at org.apache.catalina.tribes.transport.nio.PooledParallelSender.sendMessage(PooledParallelS ender.java:54)

              at org.apache.catalina.tribes.transport.ReplicationTransmitter.sendMessage(ReplicationTransm itter.java:78)

              at org.apache.catalina.tribes.group.ChannelCoordinator.sendMessage(ChannelCoordinator.java:7 7)

              at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBas e.java:77)

              at org.apache.catalina.tribes.group.interceptors.MessageDispatchInterceptor.sendMessage(Mess ageDispatchInterceptor.java:77)

              at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBas e.java:77)

              at org.apache.catalina.tribes.group.interceptors.TcpFailureDetector.sendMessage(TcpFailureDe tector.java:93)

              at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBas e.java:77)

              at org.apache.catalina.tribes.group.GroupChannel.send(GroupChannel.java:224)

              at org.apache.catalina.tribes.group.GroupChannel.send(GroupChannel.java:182)

              at org.apache.catalina.ha.tcp.SimpleTcpCluster.send(SimpleTcpCluster.java:843)

              at org.apache.catalina.ha.tcp.SimpleTcpCluster.send(SimpleTcpCluster.java:815)

              at org.apache.catalina.ha.tcp.ReplicationValve.send(ReplicationValve.java:539)

              at org.apache.catalina.ha.tcp.ReplicationValve.sendMessage(ReplicationValve.java:524)

              at org.apache.catalina.ha.tcp.ReplicationValve.sendSessionReplicationMessage(ReplicationValv e.java:506)

              at org.apache.catalina.ha.tcp.ReplicationValve.sendReplicationMessage(ReplicationValve.java: 419)

              at org.apache.catalina.ha.tcp.ReplicationValve.invoke(ReplicationValve.java:343)

              at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:422)

              at org.apache.coyote.ajp.AjpProcessor.process(AjpProcessor.java:199)

              at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.jav a:607)

              at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:314)

              at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

              at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)

              at java.lang.Thread.run(Thread.java:662)

      Caused by: java.io.IOException: Connection reset by peer

              at sun.nio.ch.FileDispatcher.read0(Native Method)

              at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)

              at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)

              at sun.nio.ch.IOUtil.read(IOUtil.java:171)

              at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)

              at org.apache.catalina.tribes.transport.nio.NioSender.read(NioSender.java:169)

              at org.apache.catalina.tribes.transport.nio.NioSender.process(NioSender.java:119)

              at org.apache.catalina.tribes.transport.nio.ParallelNioSender.doLoop(ParallelNioSender.java: 142)

              ... 26 more

      Nov 24, 2014 6:06:23 PM org.apache.catalina.tribes.group.interceptors.TcpFailureDetector memberDisappeared

      INFO: Received memberDisappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 104}:4002,{10, 10, 240, 104},4002, alive=192591, securePort=-1, UDP Port=-1, id={-96 126 -95 87 -79 5 76 -125 -101 -68 -56 -60 90 -22 -1 7 }, payload={}, command={}, domain={}, ]] message. Will verify.

      Nov 24, 2014 6:06:23 PM org.apache.catalina.tribes.group.interceptors.TcpFailureDetector memberDisappeared

      INFO: Verification complete. Member already disappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 104}:4002,{10, 10, 240, 104},4002, alive=192591, securePort=-1, UDP Port=-1, id={-96 126 -95 87 -79 5 76 -125 -101 -68 -56 -60 90 -22 -1 7 }, payload={}, command={}, domain={}, ]]

       

       

       

      I have tried all the usual stuff, like increase the timeouts, juggle ports and addresses.  It seems that the two clusters simply cannot communicate with each other.

       

      My server.xml files have been mildly tweaked for PCI compliance, so we have an SSL redirect.  I have tried going back to stock, but it doesnt seem to help either.

       

      # cat /opt/coldfusion10/cfusion1/runtime/conf/server.xml

      <Server port="8008" shutdown="SHUTDOWN">

        <Listener className="org.apache.catalina.core.AprLifecycleListener" SSLEngine="on">

        </Listener>

        <Listener className="org.apache.catalina.core.JasperListener">

        </Listener>

        <Listener className="org.apache.catalina.core.JreMemoryLeakPreventionListener">

        </Listener>

        <Listener className="org.apache.catalina.mbeans.GlobalResourcesLifecycleListener">

        </Listener>

        <GlobalNamingResources>

          <Resource description="User database that can be updated and saved" name="UserDatabase" pathname="conf/tomcat-users.xml" factory="org.apache.catalina.users.MemoryUserDatabaseFactory" type="org.apache.catalina.UserDatabase" auth="Container">

          </Resource>

        </GlobalNamingResources>

        <Service name="Catalina">

          <Executor name="tomcatThreadPool" minSpareThreads="4" maxThreads="150" namePrefix="catalina-exec-">

          </Executor>

          <Connector port="8501" protocol="org.apache.coyote.http11.Http11Protocol" connectionTimeout="20000" redirectPort="8446" executor="tomcatThreadPool" maxThreads="50">

          </Connector>

          <Connector port="8446" sslEnabledProtocols="TLSv1, TLSv1.1, TLSv1.2" protocol="HTTP/1.1" keystorePass="xxxxxxxx" SSLEnabled="true" scheme="https" secure="true" keystoreFile="/home/.keystore" keyAlias="tomcat" maxThreads="150" ciphers="TLS_RSA_WITH_AES_128_CBC_SHA, TLS_DHE_RSA_WITH_AES_128_CBC_SHA,                 TLS_DHE_DSS_WITH_AES_128_CBC_SHA" clientAuth="false">

          </Connector>

          <Connector port="8013" protocol="AJP/1.3" redirectPort="8446" tomcatAuthentication="false">

          </Connector>

          <Engine jvmRoute="cfusion1" name="Catalina" defaultHost="localhost">

            <Realm className="org.apache.catalina.realm.LockOutRealm">

              <Realm className="org.apache.catalina.realm.UserDatabaseRealm" resourceName="UserDatabase">

              </Realm>

            </Realm>

            <Host name="localhost" autoDeploy="false" unpackWARs="true" appBase="webapps">

              <Valve pattern="%h %l %u %t &quot;%r&quot; %s %b" directory="logs" prefix="localhost_access_log." className="org.apache.catalina.valves.AccessLogValve" suffix=".txt" resolveHosts="false">

              </Valve>

            </Host>

            <Cluster className="org.apache.catalina.ha.tcp.SimpleTcpCluster" channelSendOptions="6">

              <Manager notifyListenersOnReplication="true" expireSessionsOnShutdown="false" className="org.apache.catalina.ha.session.DeltaManager">

              </Manager>

                <Channel className="org.apache.catalina.tribes.group.GroupChannel">

                <Membership port="45564" dropTime="10000" address="228.0.0.104" className="org.apache.catalina.tribes.membership.McastService" frequency="500">

                </Membership>

                <Receiver port="4001" autoBind="100" address="auto" selectorTimeout="10000" maxThreads="6" className="org.apache.catalina.tribes.transport.nio.NioReceiver">

                </Receiver>

                <Sender className="org.apache.catalina.tribes.transport.ReplicationTransmitter">

                  <Transport className="org.apache.catalina.tribes.transport.nio.PooledParallelSender" timeout="30000">

                  </Transport>

                </Sender>

                <Interceptor className="org.apache.catalina.tribes.group.interceptors.TcpFailureDetector">

                </Interceptor>

                <Interceptor className="org.apache.catalina.tribes.group.interceptors.MessageDispatch15Interceptor">

                </Interceptor>

              </Channel>

              <Valve className="org.apache.catalina.ha.tcp.ReplicationValve" filter="">

              </Valve>

              <Valve className="org.apache.catalina.ha.session.JvmRouteBinderValve">

              </Valve>

              <ClusterListener className="org.apache.catalina.ha.session.JvmRouteSessionIDBinderListener">

              </ClusterListener>

              <ClusterListener className="org.apache.catalina.ha.session.ClusterSessionListener">

              </ClusterListener>

            </Cluster>

          </Engine>

        </Service>

      </Server>

       

       

      # cat /opt/coldfusion10/cfusion2/runtime/conf/server.xml

      <Server port="8009" shutdown="SHUTDOWN">

        <Listener className="org.apache.catalina.core.AprLifecycleListener" SSLEngine="on">

        </Listener>

        <Listener className="org.apache.catalina.core.JasperListener">

        </Listener>

        <Listener className="org.apache.catalina.core.JreMemoryLeakPreventionListener">

        </Listener>

        <Listener className="org.apache.catalina.mbeans.GlobalResourcesLifecycleListener">

        </Listener>

        <GlobalNamingResources>

          <Resource description="User database that can be updated and saved" name="UserDatabase" pathname="conf/tomcat-users.xml" factory="org.apache.catalina.users.MemoryUserDatabaseFactory" type="org.apache.catalina.UserDatabase" auth="Container">

          </Resource>

        </GlobalNamingResources>

        <Service name="Catalina">

          <Executor name="tomcatThreadPool" minSpareThreads="4" maxThreads="150" namePrefix="catalina-exec-">

          </Executor>

          <Connector port="8502" protocol="org.apache.coyote.http11.Http11Protocol" connectionTimeout="20000" redirectPort="8447" executor="tomcatThreadPool" maxThreads="50">

          </Connector>

          <Connector port="8447" sslEnabledProtocols="TLSv1, TLSv1.1, TLSv1.2" protocol="HTTP/1.1" keystorePass="xxxxxxxx" SSLEnabled="true" scheme="https" secure="true" keystoreFile="/home/.keystore" keyAlias="tomcat" maxThreads="150" ciphers="TLS_RSA_WITH_AES_128_CBC_SHA, TLS_DHE_RSA_WITH_AES_128_CBC_SHA,                 TLS_DHE_DSS_WITH_AES_128_CBC_SHA" clientAuth="false">

          </Connector>

          <Connector port="8014" protocol="AJP/1.3" redirectPort="8447" tomcatAuthentication="false">

          </Connector>

          <Engine jvmRoute="cfusion2" name="Catalina" defaultHost="localhost">

            <Realm className="org.apache.catalina.realm.LockOutRealm">

              <Realm className="org.apache.catalina.realm.UserDatabaseRealm" resourceName="UserDatabase">

              </Realm>

            </Realm>

            <Host name="localhost" autoDeploy="false" unpackWARs="true" appBase="webapps">

              <Valve pattern="%h %l %u %t &quot;%r&quot; %s %b" directory="logs" prefix="localhost_access_log." className="org.apache.catalina.valves.AccessLogValve" suffix=".txt" resolveHosts="false">

              </Valve>

            </Host>

            <Cluster className="org.apache.catalina.ha.tcp.SimpleTcpCluster" channelSendOptions="6">

              <Manager notifyListenersOnReplication="true" expireSessionsOnShutdown="false" className="org.apache.catalina.ha.session.DeltaManager">

              </Manager>

              <Channel className="org.apache.catalina.tribes.group.GroupChannel">

                <Membership port="45564" dropTime="10000" address="228.0.0.104" className="org.apache.catalina.tribes.membership.McastService" frequency="500">

                </Membership>

                <Receiver port="4002" autoBind="100" address="auto" selectorTimeout="10000" maxThreads="6" className="org.apache.catalina.tribes.transport.nio.NioReceiver">

                </Receiver>

                <Sender className="org.apache.catalina.tribes.transport.ReplicationTransmitter">

                  <Transport className="org.apache.catalina.tribes.transport.nio.PooledParallelSender" timeout="30000">

                  </Transport>

                </Sender>

                <Interceptor className="org.apache.catalina.tribes.group.interceptors.TcpFailureDetector">

                </Interceptor>

                <Interceptor className="org.apache.catalina.tribes.group.interceptors.MessageDispatch15Interceptor">

                </Interceptor>

              </Channel>

              <Valve className="org.apache.catalina.ha.tcp.ReplicationValve" filter="">

              </Valve>

              <Valve className="org.apache.catalina.ha.session.JvmRouteBinderValve">

              </Valve>

              <ClusterListener className="org.apache.catalina.ha.session.JvmRouteSessionIDBinderListener">

              </ClusterListener>

              <ClusterListener className="org.apache.catalina.ha.session.ClusterSessionListener">

              </ClusterListener>

            </Cluster>

          </Engine>

        </Service>

      </Server>

       

       

      Netstat does not show anything else using the same ports.

       

      Any suggestions?   Any information is greatly appreciated!

       

      Thanks,

      -Tony

        • 1. Re: Cluster Replication issues in CF10 on RHEL6
          vishu#13 Level 3


          The "Manager Pathname" should be comment out in the context.xml as well in both the instances

           

          Folllow this : https://forums.adobe.com/message/6361184#6361184

          • 2. Re: Cluster Replication issues in CF10 on RHEL6
            GuitsBoy Level 1

            Manager Pathname has been commented out since the cluster was built.  This was working up until the recent update 14.   When I roll back to update 13, it works correctly, with no such session communication errors in the log files.

             

            What's weird is that is is somewhat sporadically working (update 14).  It seems that when the box is in production, and there is a light load on the machine, I get errors, and a dead instance.  But if I shut down httpd, or if I remove the secondary IP address, or even change it to an unused secondary IP address, the instances do seem to communicate, although they still produce errors, and take much longer to light up.   The problem seems to be at least somewhat dependent on load / handling active requests.

             

            There is definitely a problem with update 14 and session replication as far as I can tell.

            • 3. Re: Cluster Replication issues in CF10 on RHEL6
              mchandna Level 1

              I can see some errors related to Connector and FYI there are few problems with update14 connectors.

              Though those issues are fixed by Adobe but they are not yet publicly available. User can get these from Adobe Support team.

              What I will suggest is you contact Adobe support team to get the latest connector dll's. And after applying connector patch see if you still get these errors in update 14.

               

              Thanks,

              Milan.

              • 4. Re: Cluster Replication issues in CF10 on RHEL6
                GuitsBoy Level 1

                Thanks, but this is a linux environment, so DLLs wont be of much help.  They may have the equivalent jar files though.  I have already opened a bug report (3857664).  Is there a better option to reach them?  Thanks for the info, glad to see I may not be the only one with problems.