Since our transition to IDS CS5 we often experience that corba instances run into a dead-lock randomly.
With a dead instance, i mean that the process is still running, the ior file still exists and seems to be valid (the contentstring is the same as on startup), but the application object is lost. It occurs randomly somewhere during execution of our code and almost always on a different part of our code.
I can simulate this state of IndesignServer by running a corba instance on my MacBook Pro, closing the MBP and wait for about an hour. Then reopen it and trying to connect. Then i get:
org.omg.CORBA.COMM_FAILURE: vmcid: SUN minor code: 201 completed: No
at com.sun.corba.se.impl.logging.ORBUtilSystemException.connectFailure(ORBUtilSystemExceptio n.java:2200)
at com.sun.corba.se.impl.logging.ORBUtilSystemException.connectFailure(ORBUtilSystemExceptio n.java:2221)
at com.sun.corba.se.impl.transport.SocketOrChannelConnectionImpl.<init>(SocketOrChannelConne ctionImpl.java:205)
at com.sun.corba.se.impl.transport.SocketOrChannelConnectionImpl.<init>(SocketOrChannelConne ctionImpl.java:218)
at com.sun.corba.se.impl.transport.SocketOrChannelContactInfoImpl.createConnection(SocketOrC hannelContactInfoImpl.java:101)
at com.sun.corba.se.impl.protocol.CorbaClientRequestDispatcherImpl.beginRequest(CorbaClientR equestDispatcherImpl.java:152)
at com.sun.corba.se.impl.protocol.CorbaClientDelegateImpl.request(CorbaClientDelegateImpl.ja va:118)
at com.adobe.ids.basics._ApplicationStub.updateFonts(Unknown Source)
at be.nss.documentserver.controller.action.OpenSubDocumentAction.execute(OpenSubDocumentActi on.java:50)
Caused by: java.lang.NullPointerException
at com.sun.corba.se.impl.transport.DefaultSocketFactoryImpl.createSocket(DefaultSocketFactor yImpl.java:59)
at com.sun.corba.se.impl.transport.SocketOrChannelConnectionImpl.<init>(SocketOrChannelConne ctionImpl.java:188)
... 9 more
This is exactly what i get on our customers servers (which don't "go to sleep" obviously). Just setting my MacBook Pro to a very short sleep time (1 min.) doesn't invoke this error. This makes it hard to simulate the problem. Apparently the sleep invoked by closing the MBP and waiting an hour isn't the same as putting it asleep, all parameters on, after 1 min.
All OSX versions are 10.6
So i have two questions:
1) What are the possible causes of those dead-locks? Errors in our code (both java and our own IDS plugin) normally cause a real instance crash which is traceable.
2) Is there a way to re-establish a connection to a dead instance?
Thanks in advance for any input!
No one any input? It becomes a real problem reported by several of our customers since we changed to CS5.
Can anyone point me into possible directions we should investigate?
What we checked already:
1) If no more than 1 thread accesses an instance at a time.
2) We checked out what the error log really meant. We found out that this problem was reported since java 1.4.20. We compile all code for 1.5, but customers work java 1.6.24/26. No idea if this is an issue.
3) We checked out the ORB string and found out that the connection was made with an ipaddress and not with a servername, excluding dns problems.
We are really stuck here.
Thanks in advance for any usefull input.