• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

Coldfusion 10 enterprise - mod_jk keeps crashing

Guest
Jul 20, 2013 Jul 20, 2013

Copy link to clipboard

Copied

Environment:

Windows server 2003

CF10 enterprise - update 11 installed

Apache 2.2 with mod_ssl

Error

After about 8 hours of use eventually the logs will start showing this message and no further requests to coldfusion are possible. Static assets are still served by apache however.

[Sun Jul 21 01:00:11 2013] [1796:1092] [error] ajp_service::jk_ajp_common.c (2470): Failed allocating AJP message buffer

[Sun Jul 21 01:00:11 2013] [1796:1092] [info] jk_handler::mod_jk.c (2748): Service error=-5 for worker=cfusion

[Sun Jul 21 01:01:36 2013] [1796:3492] [error] ajp_service::jk_ajp_common.c (2455): Failed allocating AJP message buffer

[Sun Jul 21 01:01:36 2013] [1796:3492] [info] jk_handler::mod_jk.c (2748): Service error=-5 for worker=cfusion

[Sun Jul 21 01:01:37 2013] [1796:2488] [error] ajp_service::jk_ajp_common.c (2455): Failed allocating AJP message buffer

[Sun Jul 21 01:01:37 2013] [1796:2488] [info] jk_handler::mod_jk.c (2748): Service error=-5 for worker=cfusion

Steps taken to try and address

  • Have reinstalled both apache and coldfusion.
  • Have reinstalled the apache connector after fully patching coldfusion
  • Have attempted to tune the workers.properties / server.xml files (e.g http://www.webtrenches.com/post.cfm/resolve-stability-problems-and-speed-up-coldfusion-10)
  • Have tried both multiple instances and single instance mode
  • Have even attempted to get mod_proxy_ajp working, but because this server hosts approximately 20 websites I would need to be able to set up virtual hosts in the web.xml, however for some reason coldfusions version of tomcat this is not working as expected so I get 404's on all index.cfm requests when browsing with a virtual host proxying via the ajp port. Same goes from straight mod_proxying.

I think I am out of ideas short of swtiching to IIS. But have reservations based on the fact CF10 had workers problems with IIS as well and this is win2k3 meaning the horrible iis6 and no native rewrite rules.

Views

3.5K

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guide ,
Jul 20, 2013 Jul 20, 2013

Copy link to clipboard

Copied

I don't know what is going wrong for you there and CF with apache is not my environment however some things come to mind you can try.

Are those errors reported in "mod_jk" log (cf10\config\wsconfig\N\) file? Any java errors in "cf-out" or "cf-error" logs (cf10\cfusion\logs)? Reason I ask is tomcat connector is reporting a buffer issue, tomcat uses java and sometimes one of the java memory buffers (heap new perm) could be reaching a limit.

Something else that might be worth trying is add tomcat native library to use tomcat AJP apr rather than AJP bio. Weekend for me so if you want some more details on that reply and I will respond when I am at work.

I favour applying similar values as those listed in the link you mention. One point of contention with those values is the worker.cfusion.connection_pool_timeout=60 in mod_jk properties and connectionTimeout= "60000" in server.xml. Some references suggest a value of 10 minutes not 1 minute so that would be pool_timeout=600 and connectionTimeout= "600000". Note you need to restart CF and apache to apply those tomcat changes.

HTH, Carl.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guest
Jul 20, 2013 Jul 20, 2013

Copy link to clipboard

Copied

The errors are in mod_jk.log and there are no errors in out and error logs on any of the instances. The cf logs just stop at the time of the outage in cf (meaning no more requests are making it to server).

Memory seems fine in fusion reactor. But your mention of it being a memory issue aligns with what I have seen when googling the error. I could try increasing permgen to see if that helps, the main heap seems ok though.

Will try the 10 minute timeout.

And yeah would be interested in the native tomcat library option. I have a mixture of railo and acf clients and have no issues with railo clients on mod_jk/w2k3/apache 2.2 (though the publically released mod_jk, not the custom adobe one). So I find this very odd.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guide ,
Jul 21, 2013 Jul 21, 2013

Copy link to clipboard

Copied

With Permanent Generation it can be a good idea to set an initial setting. Windows 2003 32 bit I guess? Suggest use -XX:PermSize=192m -XX:MaxPermSize=256

Something else I neglected to mention about tomcat settings. Like java sometimes the initial or minimum settings are more important than maximum. To that end in mod_jk properties and server.xml and thus CF stability and performance can benifit from:
worker.cfusion.connection_pool_minsize="max"/2 (half maximum setting)
minSpareThreads=""max"/2" (ditto half maximum)

Regards, Carl

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guest
Jul 21, 2013 Jul 21, 2013

Copy link to clipboard

Copied

Yeah I always set a min permsize....many years experience tuning and troubleshooting jvm (i also manually assign the size of newgen etc) but this is the first time I've had an issue with the connector. To be honest the heap has been fine, and I would have expected if it was a permgen issue I would have seen the usual signs in the logs. I increased more because I didn't have any better options so willing to try anything

minSpareThreads do you mean in the executor? It's not listed in the documentation as an attribute for the connector.

http://tomcat.apache.org/tomcat-6.0-doc/config/executor.html

On that note the docs seem to indicate the maxThreads defined by the executor will override any setting in the connector.  So maybe by updating the connector but not the executor based on that blog post, it's actually not doing anything at all?

The maximum number of request processing threads to be created by this Connector, which therefore determines the maximum number of simultaneous requests that can be handled. If not specified, this attribute is set to 200. If an executor is associated with this connector, this attribute is ignored as the connector will execute tasks using the executor rather than an internal thread pool.

Source: http://tomcat.apache.org/tomcat-6.0-doc/config/ajp.html

Bit confused!

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guide ,
Jul 21, 2013 Jul 21, 2013

Copy link to clipboard

Copied

First beg my pardon I notice a typo in earlier post, should read:

Suggest use -XX:PermSize=192m -XX:MaxPermSize=256m

Regarding - Unless specified maxthreads=200. What I have suggested is to increase minsparethreads since you can realise a default setting of 10 can be not very many threads waiting in pool and forces tomcat to not reach it's potential always removing threads from memory to reduce it's foot print.

Here is the references for tomcat 7, CF10 using a customised 7 not 6:

http://tomcat.apache.org/tomcat-7.0-doc/config/ajp.html

http://tomcat.apache.org/connectors-doc/reference/workers.html

Keep in mind while I have many CF10 production servers they are all IIS and my CF10 apache knowledge is little more that setup configure just to see what it looks like.

To apply a minimum thread setting on IIS tomcat properties file could look similar to EG:

worker.list=cfusion

worker.cfusion.type=ajp13

worker.cfusion.host=localhost

worker.cfusion.port=8012

worker.cfusion.max_reuse_connections=250

worker.cfusion.connection_pool_size = 400

worker.cfusion.connection_pool_minsize= 200

worker.cfusion.connection_pool_timeout = 600

Correspondingly server.xml AJP portion would have this syntax EG:

<Connector port="8012" protocol="AJP/1.3"

redirectPort="8445"

tomcatAuthentication="false"

maxThreads="400"

minSpareThreads="200"

connectionTimeout="600000" />

Something important to mention because I lack enough apache. The tomcat connector and configuration references have some red apache caveats. Applying thread changes that have worked well in an IIS may not suit apache.

You mention your familiar with JVM tuning monitoring. If you are savvy with using JVM tool jconsole (and lesser so jvisualvm) you could use that to monitor graphically what is happening with the tomcat threads. A bit hard to quickly write that up tho if you are interested I did a talk and demo on that and can find the link to recording.

I will post some tomcat native details soon tho I discuss that as well in above mentioned talk.

Regards, Carl.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guide ,
Jul 21, 2013 Jul 21, 2013

Copy link to clipboard

Copied

Re promised tomcat native library details.

What you could try here is to see if APR relieves the problem. Now don't get me wrong I am just throwing idea's out there for you to try and could be going in the wrong direction. The way I see it is there are not many other log details to go on and you are tackling the issue via1)tomcat adjustments 2)java parameters and I have also suggested 3)monitoring. APR is a tomcat alteration.

When CF starts CF10\cfusion\logs\coldfusion-error.log reports this output EG:

org.apache.catalina.core.AprLifecycleListener init

INFO: The APR based Apache Tomcat Native library which allows optimal

performance in production environments was not found on the java.library.path:

d:\\ColdFusion10\\cfusion\lib;d:\\ColdFusion10\\cfusion\jintegra\bin;d:\\ColdFusio

n10\\cfusion\jintegra\bin\international;d:\\ColdFusion10\\cfusion\lib\oosdk\classe

s\win

org.apache.coyote.AbstractProtocol init

INFO: Initializing ProtocolHandler ["ajp-bio-8012"]

org.apache.catalina.core.StandardService startInternal

INFO: Starting service Catalina

org.apache.catalina.core.StandardEngine startInternal

INFO: Starting Servlet Engine: Apache Tomcat/7.0.23

etc

Here is the APR documentation

http://tomcat.apache.org/tomcat-7.0-doc/apr.html

Download the Windows binary distribution here (or other mirror)

http://apache.mirror.uber.com.au/tomcat/tomcat-connectors/native/1.1.27/binaries/

The downloaded ZIP contains two tcnative-1.dll files for 32 and 64 bit. Depending on your Windows 2003 bit copy appropriate tcnative-1.dll to CF10\cfusion\lib . Now when you restart CF the coldfusion-error.log reports:

org.apache.catalina.core.AprLifecycleListener init

INFO: Loaded APR based Apache Tomcat Native library version.

org.apache.catalina.core.AprLifecycleListener init

INFO: APR capabilities: IPv6 [true], sendfile [true], accept filters [false],

random [true].

org.apache.coyote.AbstractProtocol init

INFO: Initializing ProtocolHandler ["ajp-apr-8012"]

org.apache.catalina.core.StandardService startInternal

INFO: Starting service Catalina

org.apache.catalina.core.StandardEngine startInternal

INFO: Starting Servlet Engine: Apache Tomcat/7.0.23

etc

Perhaps with tomcat AJP switched to use APR you might find successful outcome or even different log details which in turn might lead to a solution.

HTH, Carl.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guest
Jul 21, 2013 Jul 21, 2013

Copy link to clipboard

Copied

Thanks Carl, I'll give the native tomcat libraries a shot.

I'll also try the minsparethreads, thanks for pointing me in right direction regarding tomcat version cf ships with. That was a bit of google fail on my part when looking up the doco.

I've used visualvm for troubleshooting in the past, but will give jconsole a try.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guide ,
Jul 21, 2013 Jul 21, 2013

Copy link to clipboard

Copied

As you know from jvisualvm once CF java has JMX parameters applied and jconsole connects, select MBeans (managed beans) tab open Catalina (aka tomcat) - threadpool - ajp-bio-port (or ajp-apr-port depending if native library installed) – attributes then you can open thread count busy timeout to see what is happening with tomcat connector at the CF end.

EG:

Capture.JPG

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guest
Jul 22, 2013 Jul 22, 2013

Copy link to clipboard

Copied

So still running the cf shipped tomcat library, however with jconsole hooked up.

Crashed with the following:

Permgen is using 25%

Heap is using 25%

currentThreadBusy is 0

currentThreadCount is 10

Clearly not a resource starvation issue.

I guess next step is to try the native tomcat library.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guide ,
Jul 22, 2013 Jul 22, 2013

Copy link to clipboard

Copied

Thanks for your observations. Well indeed that would appear not overly stressed at all.

Using RDP or console Windows TASKMANGER | Processes tab, how much memory Commit Size (you might have to add that column and tick show process for all users) does the coldfusion.exe grow to?

Also might be interesting to know is CF10 using java 6 or 7 (CFadmin > System Information | Java Version and Java VM Name)?

I hope the native tomcat library (APR) helps. Carl.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guest
Jul 23, 2013 Jul 23, 2013

Copy link to clipboard

Copied

*sigh*

Last successful request 18:03:11.40 Memory 34%

Server uptime was approx 6 hours 30 mins at crash time

Java Version 1.7.0_15

http://grab.by/oHJ4

jconsole overview

http://grab.by/oHHu

connector overview

http://grab.by/oHHy

coldfusion error

http://grab.by/oHHO

Coldfusion out log stops at 16:25 (approx 96 mins before requests cease) a rotate occurred but new file is 0 byte.

mod_jk.log

http://grab.by/oHHY

task manager

http://grab.by/oHI4

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guide ,
Jul 23, 2013 Jul 23, 2013

Copy link to clipboard

Copied

Painful I agree. At least the error details are consistent tho do not know yet what the errors mean.

Heap is larger than 1.4Gb so I guess Win 03 and CF10 64 bit.

With monitoring the AJP threads double click the currentthreadcount currentthreadbusy and keepalivetimeout numbers (10, 0 & 0) to get a graph. There could still be something going on there unnoticed since those details would not be updating unless you were to keep pressing refresh button. 

Regards, Carl.

PS

Further CPU usage drops off at 6pm when crash occurred yet jconsole has not lost connection to CF so java would still seem to be responding something connector or tomcat wise has terminated.


The way I read the error is the same as before trying tomcat native library?


Interesting cf-error log commences having issues 20 minutes before crash. Are those errors new details since a change was applied as I read before no errors were occurring except what noticed in mod_jk log?

PPS

This detail (-XX:PermSize=192m -XX:MaxPermSize=256m) mentioned earlier applies to 32 bit it would perhaps not suit 64 bit tho I don't think dealing with a java perm problem here. Just thought I should say for interested readers.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guest
Jul 23, 2013 Jul 23, 2013

Copy link to clipboard

Copied

The bufferoverflow is only new error since switching to native (looks like a tomcat bug fixed in latest versions).

Scheduled tasks continue to run

I can still query with jconsole etc.

Fusion reactor is still connected, I can still get metrics via it.

permgen is currently at 512M which is complete overkill but plenty of memory on box and I was merely experimenting to no effect.

cpu drops off because traffic has switched to failover server of the cluster (load balanced by a dedicated hardware load balancer).

I was refreshing the ajp threads in console but i never cause the crash at that specific time. Due to the length of time it takes to crash and the fact it happens at inconvenient times I haven't been able to catch it at that specific time. But I don't really think it would help. If it was a thread overload I can understand, but surely once the threads are free it would be able to resume. But it's a hard unrecoverable crash.

To be honest I really wish adobe didn't cludge the tomcat and give us a modified version. It really obfuscates potential root causes and even limits my ability to work around the problem (e.g mod_proxy).

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guide ,
Jul 23, 2013 Jul 23, 2013

Copy link to clipboard

Copied

So altering to AJP to APR has not stopped crash but has given some extra log details. While that does not solve immediate problem it does provide some more leads to follow.

Threads monitoring – that is where turning that in to a graph can help because you can see history if you miss pressing refresh.

Regards, Carl.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guide ,
Jul 23, 2013 Jul 23, 2013

Copy link to clipboard

Copied

Something else - does CF10\cfusion\runtime\logs localhost_access_log provide any useful information approx 20 minutes before and during outage?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guest
Jul 24, 2013 Jul 24, 2013

Copy link to clipboard

Copied

Nothing unusual, no pattern.

I've turned on historical graph but don't hold out much hope of it helping.

i think I am left with 2 choices now, convert to railo or upgrade the box to win2k12 and switch to iis.

Railo, not because I am taking my bat and ball and going home but simply because I have I don't exprience this problem working with the standard tomcat/mod_jk install on other clients. And additionally I have more control over the tomcat which gives me more options to solve this problem (e.g changing to mod_proxy_ajp)

IIS/W2k12 This is the other option.

I am not really happy with having to use either option and now I need to weigh up which will give me stability faster.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guide ,
Jul 24, 2013 Jul 24, 2013

Copy link to clipboard

Copied

LATEST

The AJP graph will be interesting from a technical view point tho it will not stop the outage.

Railo hard for me to say. I know IIS8/W2k12/CF10 works very well for me with some connector modifications applied.

Regards, Carl.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources
Documentation