• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

Too many open socket connections causing ColdFusion to crash?

Guest
Feb 14, 2011 Feb 14, 2011

Copy link to clipboard

Copied

I’m currently working on an e-commerce site which sends and receives information to/from the client’s order management system via XML over a TCP/IP socket.  It uses a very old java-based custom tag called CFX_JSOCKET (which appears to have been written in 2002) to open the socket, send the data, and get the response.  The code that calls the custom tag and sends/receives data from the OMS pre-dates my working on the site, but its always worked, so I haven’t paid it much attention.

Back in the summer of 2009 we started experiencing issues with ColdFusion (v.7 on Window 2003 at the time) locking up on a more and more frequent basis, until it ultimately became a daily issue.  After extensive research we narrowed the issue down to the communication between the web server and our client’s order management server.  It seemed the issue with ColdFusion hanging was either related to there being too many connections open, or to these connections hanging and resulting in dead threads.  This an educated guess based on a blog post I’d seen online, not actual monitoring of either CF or the TCP/IP connections.  As soon as we dialed back the timeout on the CFX_JSOCKET tag from 20 seconds to 10, the issue disappeared, so we left it at that and moved on.

Fast forward to this January. The site is hosted at a new location, on a 64-bit Windows 2008 box running ColdFusion 9.  Over the years traffic on the site has continued to grow.  The nature of the clients business means that August and January are their business times of the year (back to school for college kids) and in January ColdFusion once again started locking up on an almost-daily basis.  

One significant difference is that the address cleansing software that previously ran on the box and was used to verify shipping addresses is not available for 64-bit, so when we moved to the new server last summer, that task was moved to the client’s order management software and handled via XML like all other interaction with that system. However, while most XML calls to that server (order input, inventory check, etc) take under a second to complete, the address cleansing call regularly takes over 5 seconds to return data, and frequently times out. 

Once we eliminated the address cleansing call from the checkout process, ColdFusion once again stopped locking up regularly.  So it appears that once again it’s the communication between the web server and the order management server that’s causing problems. We currently have that address cleansing call disabled on the web site in order to keep ColdFusion from crashing, but that’s not a long term solution.

We don’t have, nor can I find online, the source code for the CFX_JSOCKET custom tag, so I decided I’d write some CF code utilizing the java methods to open the socket, send the data, get the response, and close the connection.  My test code is working fine (under no load).  However, in trying to troubleshoot an issue I had with it, I started monitoring the TCP/IP connections using TCPView.  And I noticed that all the connections to the order management server, whether opened via the custom tag or my new code, remain open in either a TIME_WAIT or FIN_WAIT2 status for well over 2 minutes, even though I know for a fact that my new code is definitely closing the connection from the web server side. 

They do all close eventually, but I’m wondering 1. Why they’re remaining open that long; 2. Is that normal; and 3. If all these connections remaining open could be what’s causing ColdFusion to choke. 

Does this sound plausible?  If so, does anyone have any suggestions/recommendations about how to fix it?  My research seems to indicate this might be a matter of the order management system not closing the connection on its end, but I’m in way over my head, and before I go to client and tell them it’s their OMS causing the issue, I need to feel a little more confident that I’m on the right track. 

Any help or advice would be very greatly appreciated.  And thanks for taking the time to read through my long-winded explanation of the problem.

Set-up details:

ColdFusion Version: 9,0,0,251028  Standard 

Operating System: Windows Server 2008 

Java Version: 1.6.0_14 

Java VM Name: Java HotSpot(TM) 64-Bit Server VM 

Java VM Version: 14.0-b16 

Thanks,

Laurie

TOPICS
Advanced techniques

Views

3.7K

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guide ,
Feb 14, 2011 Feb 14, 2011

Copy link to clipboard

Copied

Hi Laurie,

Sorry to say have no experience with custom tag called CFX_JSOCKET. From your good description I guess a resource in CF / Java is getting consumed. I suspect CF Threads (CFadmin > Server Settings > Request Tuning > Tag Limit Settings > Maximum number of threads available for CFTHREAD). Unfortunately with CF Standard the threads total are throttled to 10 which could be bad news since in CFadmin we can not adjust it up and see if the problem is relieved or ceases.

I think if you did some CF Metrics logging (where you can check on such things as threads) you might find out which resource is being stressed then having that knowledge make an adjustment (hopefully not the CFTHREAD one).

Let me know if you want some details on enabling CF Metrics logging. Perhaps others will have better clues so you can look forward to their suggestions.

HTH, Carl.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guide ,
Feb 14, 2011 Feb 14, 2011

Copy link to clipboard

Copied

Hi Laurie,

Not aware of custom tag called CFX_JSOCKET. I guess the process you described very well is consuming a resource then you are getting a problem. Trick is what parameter to adjust. Perhaps you are running out of one the threads in CFadmin > Server Settings > Request Tuning.

I expect if you enable CF Metrics logging where you can log the threads and other resources then you can find out which parameter needs adjusting. Let me know if you want some details on enabling CF Metrics. Perhaps others will have much better idea than me and help without the overhead of logging.

The other interesting thing is you are using CF9.0.0. Do you have some reasons for not being on updater1 CF9.0.1?

HTH, Carl.

PS I posted before however seems to have gone, just hope does not come back and then I have posted twice.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Enthusiast ,
Feb 15, 2011 Feb 15, 2011

Copy link to clipboard

Copied

They do all close eventually, but I’m wondering 1. Why they’re remaining open that long; 2. Is that normal; and 3. If all these connections remaining open could be what’s causing ColdFusion to choke.

1.

TIME_WAIT is normal and prevents wandering packets network packets

from being treated as valid packets.

FIN_WAIT2 might mean that the other side is not closing the connection.

2. for TIME_WAIT yes, for FIN_WAIT2 I don't think so (I could be wrong).

3. usually TIME_WAIT connections are handled by the OS because they're

basically closed so they shouldn't could not hold CF threads.

FIN_WAIT_2 on the other hand means that the connection is not closed

yet so the thread holding the connection might be counting against the

"Maximum number of simultaneous Template requests" setting in CF

admin.

Does this sound plausible?  If so, does anyone have any suggestions/recommendations about how to fix it?  My research seems to indicate this might be a matter of the order management system not closing the connection on its end, but I’m in way over my head, and before I go to client and tell them it’s their OMS causing the issue, I need to feel a little more confident that I’m on the right track.

Watch the connections with TCPView from Sysinternals (how many

connections in FIN_WAIT_2, how many in TIME_WAIT ? all to the order

management system ?) and CF with the metrics service (how many busy

threads ? how many idle ?) :

http://kb2.adobe.com/cps/191/tn_19120.html .

Also, upgrade CF to 9.0.1 and Java to the latest version.

--

Mack

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guest
Feb 15, 2011 Feb 15, 2011

Copy link to clipboard

Copied

Thanks to those who've replied so far.

I was in the process of setting up/playing with the CF Metrics on the client's dev server this afternoon when the production site ground to a halt.  This time around I had TCPView set up on the server, and when I logged in noticed that the vast majority of connections (primarily internet traffic to the site) were in a CLOSE_WAIT status.  There was only one open connection to the client's order management system at the time, and it was in FIN_WAIT2 status.

Naturally, when we restarted ColdFusion all those CLOSE_WAIT connections went away.

In the coldfusion-out.log, in the 5 minutes leading up to the site going down, there a between 15-20 errors similar to the following:

02/15 15:44:08 Warning [jrpp-487] - Thread: jrpp-487, processing template: C:\inetpub\wwwroot\www.domain.com\history\orderDetail.cfm, completed in 42 seconds, exceeding the 30 second warning limit

With the length of time needed to complete increasing with each warning.  Some of the files refered to in the warning message are files that interact with the OMS, some don't.

It all came to a head with a half dozen instances of the following error:

java.lang.RuntimeException: Request timed out waiting for an available thread to run. You may want to consider increasing the number of active threads in the thread pool.
    at jrunx.scheduler.ThreadPool$Throttle.enter(ThreadPool.java:116)
    at jrunx.scheduler.ThreadPool$ThreadThrottle.invokeRunnable(ThreadPool.java:425)
    at jrunx.scheduler.ThreadPool$UpstreamMetrics.invokeRunnable(ThreadPool.java:266)
    at jrunx.scheduler.WorkerThread.run(WorkerThread.java:66)

Then we restarted CF and the log is full of start-up related messages.

Anyone have any thoughts or suggestions as to what direction to go in next?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guide ,
Feb 15, 2011 Feb 15, 2011

Copy link to clipboard

Copied

Still like the idea of CF Metrics to see what those threads and other JVM related things are doing.

Take a backup copy of ColdFusion\runtime\servers\coldfusion\SERVER-INF\ jrun.xml

Edit jrun.xml this bit:

<attribute name="metricsEnabled">true</attribute>

    <attribute name="metricsLogFrequency">60</attribute>

    <attribute name="metricsFormat">{listenTh},{idleTh},{delayTh},{busyTh},{totalTh},{delayRq},{droppedRq},{handledRq},{handledMs},{delayMs},{freeMemory},{totalMemory},{sessions},{sessionsInMem}</attribute>

   

True to turn on. Frequency in seconds 60 30 10 depends how much logging maybe 60 for starters. What metrics we going to look at.

The log details can sometimes be hard to read when it is just numbers  so I normally like to put that in to a graphical package to view it as graphs which can make it somewhat easier to understand. If you have a CF system working OK perhaps enable CF Metric on it as well so you can compare logs to know what good and bad look like. I can not seem to find the details on the graphic CF Metric viewer at the moment. I will look more later and reply that further detail.

HTH, Carl.

PS
link to the CF metric viewer app mentioned:


http://www.adobe.com/cfusion/exchange/index.cfm?event=extensionDetail&extid=1029407

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Enthusiast ,
Feb 16, 2011 Feb 16, 2011

Copy link to clipboard

Copied

LATEST

java.lang.RuntimeException: Request timed out waiting for an available thread to run. You may want to consider increasing the number of active threads in the thread pool.

This error suggest that you need to increase the setting for "Maximum

number of simultaneous Template requests" in CF admin. Enable CF

metrics and watch the number of idle threads, my guess is that it

decreases over time and is zero when the server is blocked.

--

Mack

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources
Documentation