17 Replies Latest reply on Jul 24, 2013 3:28 PM by carl type3

    Coldfusion 10 enterprise - mod_jk keeps crashing

    derision

      Environment:

      Windows server 2003

      CF10 enterprise - update 11 installed

      Apache 2.2 with mod_ssl

       

      Error

       

      After about 8 hours of use eventually the logs will start showing this message and no further requests to coldfusion are possible. Static assets are still served by apache however.

       

      [Sun Jul 21 01:00:11 2013] [1796:1092] [error] ajp_service::jk_ajp_common.c (2470): Failed allocating AJP message buffer

      [Sun Jul 21 01:00:11 2013] [1796:1092] [info] jk_handler::mod_jk.c (2748): Service error=-5 for worker=cfusion

      [Sun Jul 21 01:01:36 2013] [1796:3492] [error] ajp_service::jk_ajp_common.c (2455): Failed allocating AJP message buffer

      [Sun Jul 21 01:01:36 2013] [1796:3492] [info] jk_handler::mod_jk.c (2748): Service error=-5 for worker=cfusion

      [Sun Jul 21 01:01:37 2013] [1796:2488] [error] ajp_service::jk_ajp_common.c (2455): Failed allocating AJP message buffer

      [Sun Jul 21 01:01:37 2013] [1796:2488] [info] jk_handler::mod_jk.c (2748): Service error=-5 for worker=cfusion

       

      Steps taken to try and address

       

      • Have reinstalled both apache and coldfusion.
      • Have reinstalled the apache connector after fully patching coldfusion
      • Have attempted to tune the workers.properties / server.xml files (e.g http://www.webtrenches.com/post.cfm/resolve-stability-problems-and-speed-up-coldfusion-10)
      • Have tried both multiple instances and single instance mode
      • Have even attempted to get mod_proxy_ajp working, but because this server hosts approximately 20 websites I would need to be able to set up virtual hosts in the web.xml, however for some reason coldfusions version of tomcat this is not working as expected so I get 404's on all index.cfm requests when browsing with a virtual host proxying via the ajp port. Same goes from straight mod_proxying.

       

      I think I am out of ideas short of swtiching to IIS. But have reservations based on the fact CF10 had workers problems with IIS as well and this is win2k3 meaning the horrible iis6 and no native rewrite rules.

        • 1. Re: Coldfusion 10 enterprise - mod_jk keeps crashing
          carl type3 Level 4

          I don't know what is going wrong for you there and CF with apache is not my environment however some things come to mind you can try.

          Are those errors reported in "mod_jk" log (cf10\config\wsconfig\N\) file? Any java errors in "cf-out" or "cf-error" logs (cf10\cfusion\logs)? Reason I ask is tomcat connector is reporting a buffer issue, tomcat uses java and sometimes one of the java memory buffers (heap new perm) could be reaching a limit.

           

          Something else that might be worth trying is add tomcat native library to use tomcat AJP apr rather than AJP bio. Weekend for me so if you want some more details on that reply and I will respond when I am at work.

           

          I favour applying similar values as those listed in the link you mention. One point of contention with those values is the worker.cfusion.connection_pool_timeout=60 in mod_jk properties and connectionTimeout= "60000" in server.xml. Some references suggest a value of 10 minutes not 1 minute so that would be pool_timeout=600 and connectionTimeout= "600000". Note you need to restart CF and apache to apply those tomcat changes.

           

          HTH, Carl.

          • 2. Re: Coldfusion 10 enterprise - mod_jk keeps crashing
            derision Level 1

            The errors are in mod_jk.log and there are no errors in out and error logs on any of the instances. The cf logs just stop at the time of the outage in cf (meaning no more requests are making it to server).

             

            Memory seems fine in fusion reactor. But your mention of it being a memory issue aligns with what I have seen when googling the error. I could try increasing permgen to see if that helps, the main heap seems ok though.

             

            Will try the 10 minute timeout.

             

            And yeah would be interested in the native tomcat library option. I have a mixture of railo and acf clients and have no issues with railo clients on mod_jk/w2k3/apache 2.2 (though the publically released mod_jk, not the custom adobe one). So I find this very odd.

            • 3. Re: Coldfusion 10 enterprise - mod_jk keeps crashing
              carl type3 Level 4

              With Permanent Generation it can be a good idea to set an initial setting. Windows 2003 32 bit I guess? Suggest use -XX:PermSize=192m -XX:MaxPermSize=256

               

              Something else I neglected to mention about tomcat settings. Like java sometimes the initial or minimum settings are more important than maximum. To that end in mod_jk properties and server.xml and thus CF stability and performance can benifit from:
              worker.cfusion.connection_pool_minsize="max"/2 (half maximum setting)
              minSpareThreads=""max"/2" (ditto half maximum)

               

              Regards, Carl

              • 4. Re: Coldfusion 10 enterprise - mod_jk keeps crashing
                derision Level 1

                Yeah I always set a min permsize....many years experience tuning and troubleshooting jvm (i also manually assign the size of newgen etc) but this is the first time I've had an issue with the connector. To be honest the heap has been fine, and I would have expected if it was a permgen issue I would have seen the usual signs in the logs. I increased more because I didn't have any better options so willing to try anything

                 

                minSpareThreads do you mean in the executor? It's not listed in the documentation as an attribute for the connector.

                 

                http://tomcat.apache.org/tomcat-6.0-doc/config/executor.html

                 

                On that note the docs seem to indicate the maxThreads defined by the executor will override any setting in the connector.  So maybe by updating the connector but not the executor based on that blog post, it's actually not doing anything at all?

                 

                The maximum number of request processing threads to be created by this Connector, which therefore determines the maximum number of simultaneous requests that can be handled. If not specified, this attribute is set to 200. If an executor is associated with this connector, this attribute is ignored as the connector will execute tasks using the executor rather than an internal thread pool.

                 

                Source: http://tomcat.apache.org/tomcat-6.0-doc/config/ajp.html

                 

                Bit confused!

                • 5. Re: Coldfusion 10 enterprise - mod_jk keeps crashing
                  carl type3 Level 4

                  First beg my pardon I notice a typo in earlier post, should read:

                  Suggest use -XX:PermSize=192m -XX:MaxPermSize=256m

                   

                  Regarding - Unless specified maxthreads=200. What I have suggested is to increase minsparethreads since you can realise a default setting of 10 can be not very many threads waiting in pool and forces tomcat to not reach it's potential always removing threads from memory to reduce it's foot print.

                   

                  Here is the references for tomcat 7, CF10 using a customised 7 not 6:

                  http://tomcat.apache.org/tomcat-7.0-doc/config/ajp.html

                  http://tomcat.apache.org/connectors-doc/reference/workers.html

                   

                  Keep in mind while I have many CF10 production servers they are all IIS and my CF10 apache knowledge is little more that setup configure just to see what it looks like.

                  To apply a minimum thread setting on IIS tomcat properties file could look similar to EG:

                   

                  worker.list=cfusion

                  worker.cfusion.type=ajp13

                  worker.cfusion.host=localhost

                  worker.cfusion.port=8012

                  worker.cfusion.max_reuse_connections=250

                  worker.cfusion.connection_pool_size = 400

                  worker.cfusion.connection_pool_minsize= 200

                  worker.cfusion.connection_pool_timeout = 600

                   

                  Correspondingly server.xml AJP portion would have this syntax EG:

                   

                  <Connector port="8012" protocol="AJP/1.3"

                  redirectPort="8445"

                  tomcatAuthentication="false"

                  maxThreads="400"

                  minSpareThreads="200"

                  connectionTimeout="600000" />

                   

                  Something important to mention because I lack enough apache. The tomcat connector and configuration references have some red apache caveats. Applying thread changes that have worked well in an IIS may not suit apache.

                   

                  You mention your familiar with JVM tuning monitoring. If you are savvy with using JVM tool jconsole (and lesser so jvisualvm) you could use that to monitor graphically what is happening with the tomcat threads. A bit hard to quickly write that up tho if you are interested I did a talk and demo on that and can find the link to recording.

                   

                  I will post some tomcat native details soon tho I discuss that as well in above mentioned talk.

                   

                  Regards, Carl.

                   

                   

                  • 6. Re: Coldfusion 10 enterprise - mod_jk keeps crashing
                    carl type3 Level 4

                    Re promised tomcat native library details.

                     

                    What you could try here is to see if APR relieves the problem. Now don't get me wrong I am just throwing idea's out there for you to try and could be going in the wrong direction. The way I see it is there are not many other log details to go on and you are tackling the issue via1)tomcat adjustments 2)java parameters and I have also suggested 3)monitoring. APR is a tomcat alteration.

                     

                    When CF starts CF10\cfusion\logs\coldfusion-error.log reports this output EG:

                     

                    org.apache.catalina.core.AprLifecycleListener init

                    INFO: The APR based Apache Tomcat Native library which allows optimal

                    performance in production environments was not found on the java.library.path:

                    d:\\ColdFusion10\\cfusion\lib;d:\\ColdFusion10\\cfusion\jintegra\bin;d:\\ColdFusio

                    n10\\cfusion\jintegra\bin\international;d:\\ColdFusion10\\cfusion\lib\oosdk\classe

                    s\win

                    org.apache.coyote.AbstractProtocol init

                    INFO: Initializing ProtocolHandler ["ajp-bio-8012"]

                    org.apache.catalina.core.StandardService startInternal

                    INFO: Starting service Catalina

                    org.apache.catalina.core.StandardEngine startInternal

                    INFO: Starting Servlet Engine: Apache Tomcat/7.0.23

                    etc

                     

                    Here is the APR documentation

                    http://tomcat.apache.org/tomcat-7.0-doc/apr.html

                     

                    Download the Windows binary distribution here (or other mirror)

                    http://apache.mirror.uber.com.au/tomcat/tomcat-connectors/native/1.1.27/binaries/

                     

                    The downloaded ZIP contains two tcnative-1.dll files for 32 and 64 bit. Depending on your Windows 2003 bit copy appropriate tcnative-1.dll to CF10\cfusion\lib . Now when you restart CF the coldfusion-error.log reports:

                     

                    org.apache.catalina.core.AprLifecycleListener init

                    INFO: Loaded APR based Apache Tomcat Native library version.

                    org.apache.catalina.core.AprLifecycleListener init

                    INFO: APR capabilities: IPv6 [true], sendfile [true], accept filters [false],

                    random [true].

                    org.apache.coyote.AbstractProtocol init

                    INFO: Initializing ProtocolHandler ["ajp-apr-8012"]

                    org.apache.catalina.core.StandardService startInternal

                    INFO: Starting service Catalina

                    org.apache.catalina.core.StandardEngine startInternal

                    INFO: Starting Servlet Engine: Apache Tomcat/7.0.23

                    etc

                     

                    Perhaps with tomcat AJP switched to use APR you might find successful outcome or even different log details which in turn might lead to a solution.

                     

                    HTH, Carl.

                    • 7. Re: Coldfusion 10 enterprise - mod_jk keeps crashing
                      derision Level 1

                      Thanks Carl, I'll give the native tomcat libraries a shot.

                       

                      I'll also try the minsparethreads, thanks for pointing me in right direction regarding tomcat version cf ships with. That was a bit of google fail on my part when looking up the doco.

                       

                      I've used visualvm for troubleshooting in the past, but will give jconsole a try.

                      • 8. Re: Coldfusion 10 enterprise - mod_jk keeps crashing
                        carl type3 Level 4

                        As you know from jvisualvm once CF java has JMX parameters applied and jconsole connects, select MBeans (managed beans) tab open Catalina (aka tomcat) - threadpool - ajp-bio-port (or ajp-apr-port depending if native library installed) – attributes then you can open thread count busy timeout to see what is happening with tomcat connector at the CF end.

                        EG:

                        Capture.JPG

                        • 9. Re: Coldfusion 10 enterprise - mod_jk keeps crashing
                          derision Level 1

                          So still running the cf shipped tomcat library, however with jconsole hooked up.

                           

                          Crashed with the following:

                           

                          Permgen is using 25%

                          Heap is using 25%

                          currentThreadBusy is 0

                          currentThreadCount is 10

                           

                          Clearly not a resource starvation issue.

                           

                          I guess next step is to try the native tomcat library.

                          • 10. Re: Coldfusion 10 enterprise - mod_jk keeps crashing
                            carl type3 Level 4

                            Thanks for your observations. Well indeed that would appear not overly stressed at all.

                             

                            Using RDP or console Windows TASKMANGER | Processes tab, how much memory Commit Size (you might have to add that column and tick show process for all users) does the coldfusion.exe grow to?

                             

                            Also might be interesting to know is CF10 using java 6 or 7 (CFadmin > System Information | Java Version and Java VM Name)?

                             

                            I hope the native tomcat library (APR) helps. Carl.

                            • 11. Re: Coldfusion 10 enterprise - mod_jk keeps crashing
                              derision Level 1

                              *sigh*

                               

                              Last successful request 18:03:11.40 Memory 34%

                              Server uptime was approx 6 hours 30 mins at crash time

                               

                              Java Version 1.7.0_15

                               

                              http://grab.by/oHJ4

                               

                              jconsole overview

                               

                               

                              http://grab.by/oHHu

                               

                               

                              connector overview

                               

                               

                              http://grab.by/oHHy

                               

                               

                              coldfusion error

                               

                               

                              http://grab.by/oHHO

                               

                               

                              Coldfusion out log stops at 16:25 (approx 96 mins before requests cease) a rotate occurred but new file is 0 byte.

                               

                               

                              mod_jk.log

                               

                               

                              http://grab.by/oHHY

                               

                               

                              task manager

                               

                               

                              http://grab.by/oHI4

                              • 12. Re: Coldfusion 10 enterprise - mod_jk keeps crashing
                                carl type3 Level 4

                                Painful I agree. At least the error details are consistent tho do not know yet what the errors mean.

                                 

                                Heap is larger than 1.4Gb so I guess Win 03 and CF10 64 bit.

                                 

                                With monitoring the AJP threads double click the currentthreadcount currentthreadbusy and keepalivetimeout numbers (10, 0 & 0) to get a graph. There could still be something going on there unnoticed since those details would not be updating unless you were to keep pressing refresh button. 

                                 

                                Regards, Carl.

                                 

                                PS

                                 

                                Further CPU usage drops off at 6pm when crash occurred yet jconsole has not lost connection to CF so java would still seem to be responding something connector or tomcat wise has terminated.


                                The way I read the error is the same as before trying tomcat native library?


                                Interesting cf-error log commences having issues 20 minutes before crash. Are those errors new details since a change was applied as I read before no errors were occurring except what noticed in mod_jk log?

                                 

                                PPS

                                This detail (-XX:PermSize=192m -XX:MaxPermSize=256m) mentioned earlier applies to 32 bit it would perhaps not suit 64 bit tho I don't think dealing with a java perm problem here. Just thought I should say for interested readers.

                                • 13. Re: Coldfusion 10 enterprise - mod_jk keeps crashing
                                  derision Level 1

                                  The bufferoverflow is only new error since switching to native (looks like a tomcat bug fixed in latest versions).

                                   

                                  Scheduled tasks continue to run

                                  I can still query with jconsole etc.

                                  Fusion reactor is still connected, I can still get metrics via it.

                                   

                                  permgen is currently at 512M which is complete overkill but plenty of memory on box and I was merely experimenting to no effect.

                                   

                                  cpu drops off because traffic has switched to failover server of the cluster (load balanced by a dedicated hardware load balancer).

                                   

                                  I was refreshing the ajp threads in console but i never cause the crash at that specific time. Due to the length of time it takes to crash and the fact it happens at inconvenient times I haven't been able to catch it at that specific time. But I don't really think it would help. If it was a thread overload I can understand, but surely once the threads are free it would be able to resume. But it's a hard unrecoverable crash.

                                   

                                  To be honest I really wish adobe didn't cludge the tomcat and give us a modified version. It really obfuscates potential root causes and even limits my ability to work around the problem (e.g mod_proxy).

                                  • 14. Re: Coldfusion 10 enterprise - mod_jk keeps crashing
                                    carl type3 Level 4

                                    So altering to AJP to APR has not stopped crash but has given some extra log details. While that does not solve immediate problem it does provide some more leads to follow.

                                     

                                    Threads monitoring – that is where turning that in to a graph can help because you can see history if you miss pressing refresh.

                                     

                                    Regards, Carl.

                                    • 15. Re: Coldfusion 10 enterprise - mod_jk keeps crashing
                                      carl type3 Level 4

                                      Something else - does CF10\cfusion\runtime\logs localhost_access_log provide any useful information approx 20 minutes before and during outage?

                                      • 16. Re: Coldfusion 10 enterprise - mod_jk keeps crashing
                                        derision Level 1

                                        Nothing unusual, no pattern.

                                         

                                        I've turned on historical graph but don't hold out much hope of it helping.

                                         

                                        i think I am left with 2 choices now, convert to railo or upgrade the box to win2k12 and switch to iis.

                                         

                                        Railo, not because I am taking my bat and ball and going home but simply because I have I don't exprience this problem working with the standard tomcat/mod_jk install on other clients. And additionally I have more control over the tomcat which gives me more options to solve this problem (e.g changing to mod_proxy_ajp)

                                         

                                        IIS/W2k12 This is the other option.

                                         

                                        I am not really happy with having to use either option and now I need to weigh up which will give me stability faster.

                                        • 17. Re: Coldfusion 10 enterprise - mod_jk keeps crashing
                                          carl type3 Level 4

                                          The AJP graph will be interesting from a technical view point tho it will not stop the outage.

                                           

                                          Railo hard for me to say. I know IIS8/W2k12/CF10 works very well for me with some connector modifications applied.

                                           

                                          Regards, Carl.