We are prototyping a clustered BlazeDS / Flex solution,
and we can't find out how to make the failover mechanism work 'every time'.
We need our IHM to (re)connect transparently to a destination in case of a BlazeDS server failure.
Our architecture looks like :
Client Layer : MS-IE / Flash player 10
Flex application consuming streaming AMF messages.
L1 Layer : two linux servers whith a tomcat 6 server / jdk 1.5
BlazeDS exposes a clustered destination (with JGroups) via a Streaming AMF Channel.
Data are obtained via the JMSadapter from a remote JMS source.
L2 Layer : linux servers whith a tomcat 6 server / activeMQ 5.1 / jdk 1.5
JMS Broker serving a JMS Topic.
We ran the following test and gets various results :
1- Start the L2 JMS-Producer.
2- Start L1-a and L1-b BlazeDS servers :
We can notice that the Jgroups Cluster is starting correctly.
3- Start an IHM with http://L1-a:8180/.... :
Connection works perfectly.
DS-Console@L1-a JMS-Consumer count is 1.
DS-Console@L1-a Message-Suscribers count is 1.
4- Start an IHM on http://L2-a:8180/.... :
Connection works perfectly.
DS-Console@L2-a JMS-Consumer count is 1.
DS-Console@L2-a Message-Suscribers count is 1.
5- Now we kill the tomcat server on L1-a (ctrl/C)
IHM on L2-a continues to work OK.
IHM on L1-a : detects 'connection loss'
On this event, we re-create the consumer ... but it rarely reconnects.
Most of the time (95/100 !) we have an error on IHM :"Consumer subscribe error - The consumer was not able to subscribe to its target destination."
The DS-console on L2-a gives the following information at this point :
- StreamingAMF / Streaming Client count = 1
- MessageDestination.JMSAdapter, topicConsumerCount = 0
- MessageService.MessageDestination.SubscriptionManager, SubscriberCount = 0
Sometimes, we get a error message on the server : "max-streaming-connection-per-session limit of 1 exceeded"
If we set down the network interface of L1-a, the problem occurs 100% of the time
Does anybody knows why this clusterd solution doesn't want to work ?
Thanks for your help.
(Quentin_Buonanno), Feb 3, 2009 6:17 AM
Hi. So, if everything is working correctly the Consumer should be failing over to the new server without you needing to do anything. I'm not sure why you are listening for a disconnect event and then re-creating the Consumer. You shouldn't need to do that.
When you are using a destination that is clustered, the client should get a list of the other servers in the cluster when it connects to the first server. On the Channel you are using, this list of servers is kept in the failoverURIs property.
When the client gets disconnected from the server it is connected over, it should go through the list of endpoints in the failoverURIs property and try to connect to each endpoint listed there.
You should be able to see this happening if you turn on client side logging. Here are instructions for how to do this if you haven't done it before.
When you kill the Tomcat server on L1-a, when you look at the flashlog you should see the client connected to L1-a lose it's connection and then attempt to failover to another endpoint from the list of failoverURIs. If you see something different please respond back and include the contents of your flashlog.txt and I'll take a look.
i==> consumer tries L1-a (vm3) endpoint for reconnection - it fails
'my-streaming-amf' channel got status. (Object)#0
code = "NetConnection.Connect.Closed"
level = "status"
'my-streaming-amf' pinging endpoint.
Avertissement :Le domaine vm4 ne spécifie pas de méta-régulation. Application de la méta-régulation 'master-only' par défaut. Cette configuration est déconseillée. Pour résoudre ce problème, consultez http://www.adobe.com/go/strict_policy_files_fr.
'my-streaming-amf' channel will disconnect and reconnect with with its session identifier ';jsessionid=5D7CC1D53602791DA12DAA1EE09538B2' appended to its endpoint url
'my-streaming-amf' channel is connected.
'2FD5DC32-82A9-F552-02CF-37BD7CCD1380' consumer connected.
i==> consumer finally connects to L1-b (vm4) endpoint BUT it does not receive messages from the channel.
i==> "'my-streaming-amf' channel sending message:" log is missing
bTest #3 : We use a resuscribe-button on the IHM
*This test is made just after the previous auto-resuscribe attempt
The flashLog says :
'2FD5DC32-82A9-F552-02CF-37BD7CCD1380' consumer subscribe.
'my-streaming-amf' channel sending message:
'2FD5DC32-82A9-F552-02CF-37BD7CCD1380' consumer acknowledge for subscribe. Client id '3715A0E2-5C3E-454B-F790-1792E2074E85' new timestamp 1233591142437
'2FD5DC32-82A9-F552-02CF-37BD7CCD1380' consumer acknowledge of '449B2768-11FE-05E2-9656-37C309ED732D'.
i==> the system is now working
bConclusion : We can't find a way to make the consumer reconnect/resuscribe automatically.
*if we don't catch the disconnect event, the consumer on the clustered destination does not try to reconnect automatically
*if we catch the disconnect event, and force the consumer.suscribe() the consumer reconnects on a failover channel, but does not receives messages (it does not suscribe, only connect)
i==> Are we doing something wrong, or is there any work around ?