28 Replies Latest reply: Jun 26, 2012 8:16 AM by Joop Kaashoek RSS

    Bad performance on CF9 after upgrade from CF8

    erloz Community Member

      Hello and happy new year for everybody 

       

      I'm stuck with a problem after upgrade from CF8 entreprise to CF9 entreprise.

      We are using IIS6 on WIN2003R2 (SP2).

       

      There are a lot of  "503 Server Error" with no content. (about 0.05% of request). This can be occured on big or small CFM, graphic ressources, etc...

      For each error I can see the following error on the connector logs (jrun_iis6_wildcardYYMMDD.log).

      2011-01-02 19:00:05 jrISAPI[1456:3940]  returning error page for JRun too busy or out of memory

       

      I've never seen this message in previous CF connector version.

       

      And on the jrun log (coldfusion-outXXX.log), I read the following errors:

      02/01 20:53:31 error unexpected end of part
      java.io.IOException: unexpected end of part
          at com.oreilly.servlet.multipart.PartInputStream.fill(PartInputStream.java:96)
          at com.oreilly.servlet.multipart.PartInputStream.read(PartInputStream.java:191)
          at com.oreilly.servlet.multipart.PartInputStream.read(PartInputStream.java:152)
          at com.oreilly.servlet.multipart.FilePart.write(FilePart.java:257)
          at com.oreilly.servlet.multipart.FilePart.writeTo(FilePart.java:215)
          at coldfusion.filter.FormScope.fillForm(FormScope.java:252)
          at coldfusion.filter.FusionContext.SymTab_initForRequest(FusionContext.java:376)
          at coldfusion.filter.GlobalsFilter.invoke(GlobalsFilter.java:33)
          at coldfusion.filter.DatasourceFilter.invoke(DatasourceFilter.java:22)
          at coldfusion.filter.CachingFilter.invoke(CachingFilter.java:53)
          at coldfusion.filter.RequestThrottleFilter.invoke(RequestThrottleFilter.java:126)
          at coldfusion.CfmServlet.service(CfmServlet.java:200)
          at coldfusion.bootstrap.BootstrapServlet.service(BootstrapServlet.java:89)
          at jrun.servlet.FilterChain.doFilter(FilterChain.java:86)
          at coldfusion.monitor.event.MonitoringServletFilter.doFilter(MonitoringServletFilter.java:42 )
          at coldfusion.bootstrap.BootstrapFilter.doFilter(BootstrapFilter.java:46)
          at jrun.servlet.FilterChain.doFilter(FilterChain.java:94)
          at jrun.servlet.FilterChain.service(FilterChain.java:101)
          at jrun.servlet.ServletInvoker.invoke(ServletInvoker.java:106)
          at jrun.servlet.JRunInvokerChain.invokeNext(JRunInvokerChain.java:42)
          at jrun.servlet.JRunRequestDispatcher.invoke(JRunRequestDispatcher.java:286)
          at jrun.servlet.ServletEngineService.dispatch(ServletEngineService.java:543)
          at jrun.servlet.jrpp.JRunProxyService.invokeRunnable(JRunProxyService.java:203)
          at jrunx.scheduler.ThreadPool$DownstreamMetrics.invokeRunnable(ThreadPool.java:320)
          at jrunx.scheduler.ThreadPool$ThreadThrottle.invokeRunnable(ThreadPool.java:428)
          at jrunx.scheduler.ThreadPool$UpstreamMetrics.invokeRunnable(ThreadPool.java:266)
          at jrunx.scheduler.WorkerThread.run(WorkerThread.java:66)

      02/01 20:53:31 error (JRun Service: ProxyService [jrun.servlet.jrpp.JRunProxyService@4006ca]) JRunPRoxyServer.invokeRunnable:
      java.lang.IllegalStateException
          at jrun.servlet.JRunResponse.getWriter(JRunResponse.java:205)
          at jrun.servlet.JRunResponse.sendError(JRunResponse.java:597)
          at jrun.servlet.JRunRequestDispatcher.invoke(JRunRequestDispatcher.java:328)
          at jrun.servlet.ServletEngineService.dispatch(ServletEngineService.java:543)
          at jrun.servlet.jrpp.JRunProxyService.invokeRunnable(JRunProxyService.java:203)
          at jrunx.scheduler.ThreadPool$DownstreamMetrics.invokeRunnable(ThreadPool.java:320)
          at jrunx.scheduler.ThreadPool$ThreadThrottle.invokeRunnable(ThreadPool.java:428)
          at jrunx.scheduler.ThreadPool$UpstreamMetrics.invokeRunnable(ThreadPool.java:266)
          at jrunx.scheduler.WorkerThread.run(WorkerThread.java:66)

      java.lang.IllegalStateException
          at jrun.servlet.JRunResponse.getWriter(JRunResponse.java:205)
          at jrun.servlet.JRunResponse.sendError(JRunResponse.java:597)
          at jrun.servlet.JRunRequestDispatcher.invoke(JRunRequestDispatcher.java:328)
          at jrun.servlet.ServletEngineService.dispatch(ServletEngineService.java:543)
          at jrun.servlet.jrpp.JRunProxyService.invokeRunnable(JRunProxyService.java:203)
          at jrunx.scheduler.ThreadPool$DownstreamMetrics.invokeRunnable(ThreadPool.java:320)
          at jrunx.scheduler.ThreadPool$ThreadThrottle.invokeRunnable(ThreadPool.java:428)
          at jrunx.scheduler.ThreadPool$UpstreamMetrics.invokeRunnable(ThreadPool.java:266)
          at jrunx.scheduler.WorkerThread.run(WorkerThread.java:66)

       

       

      Connector settings (jrun_iis6_wildcard.ini) :

       

      • verbose=false
      • Buffer=true (I've tried with false)
      • serverstore=C:/ColdFusion9/runtime/lib/wsconfig/1/jrunserver.store
      • bootstrap=127.0.0.1:51800
      • apialloc=false
      • maxworkerthreads=100 (I've tried to increase/decrease this value)

       

      JRUN settings:

       

      • Threads number: 150
      • Max queued jrun threads: 150

       

      Is anyone have experience the same issue ? How can I try to solve this ?

       

      Big thanks!

        • 1. Re: Bad performance on CF9 after upgrade from CF8
          Steven Erat CommunityMVP

          Right off the bat, I suspect that setting the JRun thread pool size to 150

          is probably hurting you during periods of higher traffic.  See my blog entry

          for more:

          http://www.talkingtree.com/blog/index.cfm/2009/8/3/ColdFusion-Request-Tuning-Settings-in-D epth

           

          Also, you mention that the problem occurs on CFM files and graphic

          resources.  This is an important distinction.  One is dynamic content, the

          other is static content.  If you are experiencing an increase in 503 errors

          for static content (html, gif, jpg, css, js) as verified in the IIS logs

          then you may have a problem that is not related to ColdFusion.

          • 2. Re: Bad performance on CF9 after upgrade from CF8
            Dave Watts CommunityMVP

            Hi, Steven!

             

            The graphics might still be a problem with CF/JRun, as the default context root means that the JRun connector receives every URL before deciding whether to process it with CF or return it to IIS. IIS itself isn't going to generate a 5xx error for static content.

             

            Dave Watts, CTO, Fig Leaf Software

            http://www.figleaf.com/

            http://training.figleaf.com/

            • 3. Re: Bad performance on CF9 after upgrade from CF8
              Steven Erat CommunityMVP

              Hi Dave,

               

              Its been a few years since I dug into connector filtering (big caveat as

              this is a bit of a meander and some of this is thinking out loud to refresh

              myself), but as I recall the connector property ignoresuffixmap has a

              default of true, and the result of which should be to not send all URL

              patterns to JRun for processing, but rather to let IIS determine if a

              request URL should be handled by IIS, by JRun, or by something else.

               

              I believe the sequence of events is that when the connector starts up it

              gathers a list of servlet mappings from JRun for things like *.cfm, or

              /flashremoting/, etc and that when ignoresuffixmap is true a request for

              /foo.cfm would show in the verbose connector logs as "filtering ignored

              suffix .cfm" followed by "filterRequest no match", then control would return

              to IIS for further extension mapping.  If ignoresuffixmap were changed to

              false, then the connector would begin attempting to pattern match the URL to

              known servlet mappings for the JRun server and then send them to JRun

              immediately for processing without letting IIS inspect the URL further.

               

              If there are no servlet mappings for static content types, and IIS is not

              configured to send static content types to CF (as some people do "hide" the

              app server type by using .html extensions instead of .cfm), then no matter

              the value of ignoresuffixmap, the connector wouldn't send the URL to be

              processed by JRun.  That means that if JRun were down/busy/unavailable to

              process requests, it wouldn't affect requests for static content types of

              html, css, js, jpg, etc because the connector would never send the request

              over to JRun.

               

              I suppose, however, that if there were some kind of problem with the

              connector's ability to perform the filtering (including filtering to ignore

              suffixes), then the connector could (possibly?) be the culprit returning 503

              status codes back to IIS even though the actual JRun server itself was never

              brought into the loop.  ... Finally, this is the "Ah ha!" moment when I

              realize that you are correct.

               

              So in short: What Dave said.  It could be the connector.  You might want to

              enable connector logging as the next step in finding out to what's going on.

              • 4. Re: Bad performance on CF9 after upgrade from CF8
                erloz Community Member

                I will try to play with extensions management to ignore static content, but with CF8, all content are handle by JRun, and I've never seen "returning error page for JRun too busy or out of memory".

                Do you know exactly when this log output is displayed ? I cannot enable verbose log when server is in production (really too big), and I didn't succeed to reproduce the case.

                 

                However, I've captured a connection with wireshark when the error has occured. There is no delay during the request, so I think the problem is not linked with sort of timeout or something like that.

                 

                GET / HTTP/1.1

                User-Agent: check_http/v1861 (nagios-plugins 1.4.11)

                Connection: close

                Host: xxxxxxxxx


                HTTP/1.1 503 Server Error

                Connection: close

                Date: Thu, 30 Dec 2010 01:26:40 GMT

                Server: Microsoft-IIS/6.0

                content-type: text/html


                (no more content in response)

                 

                Edit: I've enable verbose log... and checking disk space

                • 5. Re: Bad performance on CF9 after upgrade from CF8
                  Steven Erat CommunityMVP


                  For now, I'd start with the basics. 

                   

                  - Tuning the request pools (JRun active handler request pool, then secondarily the 4 CF request pools cfm/cfc/ws/fr).  I still suspect that having the JRun running request pool (maxactivehandlers) set to 150 will a problem at some point.  During any heavy traffic, the JVM could work the CPUs too hard and unnecessarily jack up memory requirements until the server goes into a thrashing state.  I could be wrong since I don't know your hardware or your application, but my experience with helping others tune servers has shown that lower numbers produce better throughput and reduce/eliminate the thrashing of resources.

                   

                  - Analyze the JRun logs for "Java heap space", OutOfMemoryError, and watch the JRun/bin directory for the presence of Hotspot crash logs (logs starting with "hs..."  Can you trace down any of these at all?  There's a whole line of questioning to follow up on this if found.

                   

                  - Can you enable JRun metrics logging? This will track active/busy request count as well as memory used/free.  It would be good to know if there is a relationship between high values for reqs or memory with the start of the 503 errors.

                   

                  Also, when exactly do the 503 errors start?  Would you say it happens right after CF/IIS restart, or only after a surge in traffic, or seemingly random?  Can you correlate the frequency of the errors with any other events like maybe some scheduled tasks or ... ?

                   

                  Regarding the connector, I found one possibly related thread about how a change to web.xml to add a new extension for CF files had caused 503s.  See comments by "Cheve"at

                  http://www.simonwhatley.co.uk/poor-mans-http-compression-with-coldfusion

                   

                  Also found a CF bug where a developer was purposefully creating a memory leak to test a scenario, and it resulted in the 503 error, correlated with the Java Heap Space error message: http://www.elliottsprehn.com/cfbugs/bugs/78721#stack-trace   The bug is marked closed, but with no comments by Adobe about its status or any explanation.   I just re-read your original post, and I see now that your CF9 connector logs were mentioning a possible memory situation "JRun too busy or out of memory" and that CF8 did not.  So that might be more reason to seek down evidence of memory issues in the logs and try to trace it back to the app.  I also noticed that your original post had stack traces that also included IllegalStateException errors at the same time, and that bug report has a similar pattern of showing that exception right around the time of the memory error.  In your case, preceding that is the com.oreilly stack trace refering to unexpected end of part.  That com.oreilly library is responsible for handling file uploads, so you may want to purse that angle as well.

                   

                  Hope this helps

                  • 6. Re: Bad performance on CF9 after upgrade from CF8
                    erloz Community Member

                    Here is the day report.

                     

                    I've enable verbose log on connector almost all the day. I've seen ZERO "returning error page for JRun too busy or out of memory" in my multigigabit log! Around 5:00 PM I've turned off debug, and in the 30 minutes after, 3 lines "returning error page for JRun too busy or out of memory" ... With the 3 "503" error in my IIS log file.

                     

                    Or I've very unlucky, or the connector is really boggy!!!??

                     

                    For now, I've turned on metrics on JRun.

                     

                    Connector log:

                     

                    2011-01-04 18:25:55 jrISAPI[3912:4140]  returning error page for JRun too busy or out of memory

                    2011-01-04 18:26:26 jrISAPI[9412:4140]  returning error page for JRun too busy or out of memory

                    2011-01-04 18:26:30 jrISAPI[1532:4140]  returning error page for JRun too busy or out of memory

                     

                    JRun metrics:

                     

                     

                    04/01 18:24:00 metrics Web threads (busy/total/listen/idle/delay/drop/handle): 14/40/22/4/0/0/147 Sessions: 25 Total Memory=1036288 Free=891897

                    04/01 18:25:00 metrics Web threads (busy/total/listen/idle/delay/drop/handle): 12/40/22/6/0/0/109 Sessions: 34 Total Memory=1038208 Free=947887

                    04/01 18:26:00 metrics Web threads (busy/total/listen/idle/delay/drop/handle): 13/42/23/6/0/0/203 Sessions: 42 Total Memory=1039296 Free=940044

                    04/01 18:27:00 metrics Web threads (busy/total/listen/idle/delay/drop/handle): 10/42/25/7/0/0/232 Sessions: 54 Total Memory=1037760 Free=937702

                    04/01 18:28:00 metrics Web threads (busy/total/listen/idle/delay/drop/handle): 13/42/25/4/0/0/247 Sessions: 64 Total Memory=1040768 Free=889198

                    04/01 18:29:00 metrics Web threads (busy/total/listen/idle/delay/drop/handle): 12/42/25/5/0/0/149 Sessions: 69 Total Memory=1041856 Free=877038

                    04/01 18:30:00 metrics Web threads (busy/total/listen/idle/delay/drop/handle): 13/42/25/4/0/0/147 Sessions: 74 Total Memory=1042240 Free=874367

                     

                    IIS Logs:

                     

                    2011-01-04 18:26:26 GET / - - XX.XX.XX.XX HTTP/1.1 503 137
                    2011-01-04 18:26:30 GET /fdsfsdqfsdqfsdqfqsdqf/83451815_0.jpg - - XX.XX.XX.XX HTTP/1.1 503 137
                    2011-01-04 18:31:49 POST /fdsfsdfdsf/upload.cfm var=value - XX.XX.XX.XX HTTP/1.1 503 137

                     

                    The 1st is a CFM ressource, the 2nd is a static ressource, and the last one is a POST to a CFM ressource. For the 3 requests I have the same stack trace on JRun log I've put on my first post.

                     

                    The JRun logs have no entry about a memory issue, and it seems to appear at random moment. Obviously I've more frequent under heady load, but when you find a 503 at the timestamp "2011-01-01 01:43:03 jrISAPI[7164:2380]  returning error page for JRun too busy or out of memory", I think there is not a lot of people that spent time on our website on the new year night

                     

                    There is no scheduled task in the server, and I didn't find any hs... file in my runtime\bin directory (never crash?).

                     

                    I'm lost...

                     

                    Can you explain me the value handledRq (the last one) on my metrics logs ??

                    • 7. Re: Bad performance on CF9 after upgrade from CF8
                      Steven Erat CommunityMVP

                      I believe that handledRq refers to how many requests were handled or

                      processed to completion during the last metric logging interval (1 minute).

                       

                      The metrics logging shows that memory is fine (free remains high), and that

                      you have about 10-14 active requests at the moment of logging, with a high

                      water mark of 42 handler threads with 5 or 6 that have been inactive long

                      enough to be considered idle.  The requests handled per minute has a 50%+

                      increase between 18:24 and 18:27, a minor surge, but still none shown for

                      delay.

                       

                      Seven years ago I once blogged about how if JRun queues requests and those

                      queued requests timeout there without ever running that JRun will return a

                      503 status code with the error message "returning error page for JRun too

                      busy or out of memory".  However, it doesn't appear to be the case here

                      since your JRun metrics shows 0 for delay.

                       

                      The information you provided so far has been helpful, but at this point

                      there is still not enough for me to make a clear conclusion.  As the next

                      step I'd take several "snapshots" from the Server Monitor while the 503

                      errors are occurring (snapshots are triggered via a web request, so

                      generating a thread dump directly on the OS to bypass the web may be a

                      better option).  The snapshots/threaddumps will show you exactly what

                      threads are running, how many, what they are doing down to the line of code

                      (java source code, not just cfml code).

                       

                      Also, you may also want to contact a ColdFusion consultant to help you

                      troubleshoot this for you further.  I recommend FigLeaf.com or go2ria.com.

                      • 8. Re: Bad performance on CF9 after upgrade from CF8
                        Steven Erat CommunityMVP

                        Although I'm certainly willing to continue trying to help out here with the

                        problem under discussion, this form of help might be to slow for you. So

                        once again I'll suggest that if this is an urgent business need, then it's

                        really in your best interest to hire a dedicated and experienced consultant

                        to help resolve this.

                         

                        In addition to the consultants I mentioned earlier, I'll point out that

                        Charlie Arehart maintains a very helpful resource center, CF411.com,

                        intended to help people in just this situation.  One of the resources is a

                        larger list of CF consulting companies.  Charlie himself is known to be

                        among the very best at diving into all manner of ColdFusion performance

                        issues.

                         

                        See this for more:

                         

                        CF-oriented Troubleshooting Consultant

                        http://www.cf411.com/#cfassist

                        • 9. Re: Bad performance on CF9 after upgrade from CF8
                          charlie@carehart.org CommunityMVP

                          Thanks for that, Steven. I've stayed quiet in this thread because so many others had jumped in and were offering useful info.

                           

                          I'll add this, though: erloz, you refer to errors you're seeing in the web server connector logs. What are you seeing instead in the \runtime\logs (or \logs, if on Multiserver)? I know you've seen errors related to non-CF files, but given the wildcard filter in IIS and how the connector does seem to at least look at non-CF files briefly, what you may see in the connector logs may be more of the result of a more serious error having happened first in processing CF pages that may make it non-responsive to the connector. That's a bit of a wild guess, but it's worth checking. In particular, look in those logs for any "outofmemory" errors (search for that specific string).

                           

                          I have a blog entry that helps address some of what you may see if you find them:

                           

                          http://www.carehart.org/blog/client/index.cfm/2010/11/3/when_memory_problems_arent_what_th ey_seem_part_1

                           

                          Finally, you don't make it clear: are these errors happening on ALL requests, all the time? You referred to it as part of something happening on an upgrade from CF 8 to 9. Has this been since the upgrade? Constant? Periodic? Any additional info may help.

                           

                          /charlie

                          • 10. Re: Bad performance on CF9 after upgrade from CF8
                            carl type3 Community Member

                            Hello,

                             

                            Looking thru the thread you do not say what is the bit 32 or 64? I mention because on 64 bit I have noticed the default MaxPermSize inadequate, which leads to many major Garbage Collections - part of major GC being stop all Java process hence bad for performance.

                             

                            With the Java messages in the OUT log I am curious as to the JVM arguments in CF\runtime\bin\JVM.CONFIG?

                             

                            I am not entirely sure CF JVM is the cause of problem however I would not discount it particularly since the jrun_iis6_wildcardYYMMDD.log - returning error page for JRun too busy or out of memory (tho I not seen that one before myself).

                             

                            HTH, Carl.
                            • 11. Re: Bad performance on CF9 after upgrade from CF8
                              erloz Community Member

                              @Charlie:

                               

                              in the cf\runtime\logs there is only that kind of information:

                               

                              05/01 10:28:44 error unexpected end of part
                              java.io.IOException: unexpected end of part
                                  at com.oreilly.servlet.multipart.PartInputStream.fill(PartInputStream.java:96)
                                  at com.oreilly.servlet.multipart.PartInputStream.read(PartInputStream.java:191)
                                  at com.oreilly.servlet.multipart.PartInputStream.read(PartInputStream.java:152)
                                  at com.oreilly.servlet.multipart.FilePart.write(FilePart.java:257)
                                  at com.oreilly.servlet.multipart.FilePart.writeTo(FilePart.java:215)
                                  at coldfusion.filter.FormScope.fillForm(FormScope.java:252)
                                  at coldfusion.filter.FusionContext.SymTab_initForRequest(FusionContext.java:376)
                                  at coldfusion.filter.GlobalsFilter.invoke(GlobalsFilter.java:33)
                                  at coldfusion.filter.DatasourceFilter.invoke(DatasourceFilter.java:22)
                                  at coldfusion.filter.CachingFilter.invoke(CachingFilter.java:53)
                                  at coldfusion.filter.RequestThrottleFilter.invoke(RequestThrottleFilter.java:126)
                                  at coldfusion.CfmServlet.service(CfmServlet.java:200)
                                  at coldfusion.bootstrap.BootstrapServlet.service(BootstrapServlet.java:89)
                                  at jrun.servlet.FilterChain.doFilter(FilterChain.java:86)
                                  at coldfusion.monitor.event.MonitoringServletFilter.doFilter(MonitoringServletFilter.java:42 )
                                  at coldfusion.bootstrap.BootstrapFilter.doFilter(BootstrapFilter.java:46)
                                  at jrun.servlet.FilterChain.doFilter(FilterChain.java:94)
                                  at jrun.servlet.FilterChain.service(FilterChain.java:101)
                                  at jrun.servlet.ServletInvoker.invoke(ServletInvoker.java:106)
                                  at jrun.servlet.JRunInvokerChain.invokeNext(JRunInvokerChain.java:42)
                                  at jrun.servlet.JRunRequestDispatcher.invoke(JRunRequestDispatcher.java:286)
                                  at jrun.servlet.ServletEngineService.dispatch(ServletEngineService.java:543)
                                  at jrun.servlet.jrpp.JRunProxyService.invokeRunnable(JRunProxyService.java:203)
                                  at jrunx.scheduler.ThreadPool$DownstreamMetrics.invokeRunnable(ThreadPool.java:320)
                                  at jrunx.scheduler.ThreadPool$ThreadThrottle.invokeRunnable(ThreadPool.java:428)
                                  at jrunx.scheduler.ThreadPool$UpstreamMetrics.invokeRunnable(ThreadPool.java:266)
                                  at jrunx.scheduler.WorkerThread.run(WorkerThread.java:66)

                              05/01 10:28:44 error (JRun Service: ProxyService [jrun.servlet.jrpp.JRunProxyService@16c00e2]) JRunPRoxyServer.invokeRunnable:
                              java.lang.IllegalStateException
                                  at jrun.servlet.JRunResponse.getWriter(JRunResponse.java:205)
                                  at jrun.servlet.JRunResponse.sendError(JRunResponse.java:597)
                                  at jrun.servlet.JRunRequestDispatcher.invoke(JRunRequestDispatcher.java:328)
                                  at jrun.servlet.ServletEngineService.dispatch(ServletEngineService.java:543)
                                  at jrun.servlet.jrpp.JRunProxyService.invokeRunnable(JRunProxyService.java:203)
                                  at jrunx.scheduler.ThreadPool$DownstreamMetrics.invokeRunnable(ThreadPool.java:320)
                                  at jrunx.scheduler.ThreadPool$ThreadThrottle.invokeRunnable(ThreadPool.java:428)
                                  at jrunx.scheduler.ThreadPool$UpstreamMetrics.invokeRunnable(ThreadPool.java:266)
                                  at jrunx.scheduler.WorkerThread.run(WorkerThread.java:66)

                              java.lang.IllegalStateException
                                  at jrun.servlet.JRunResponse.getWriter(JRunResponse.java:205)
                                  at jrun.servlet.JRunResponse.sendError(JRunResponse.java:597)
                                  at jrun.servlet.JRunRequestDispatcher.invoke(JRunRequestDispatcher.java:328)
                                  at jrun.servlet.ServletEngineService.dispatch(ServletEngineService.java:543)
                                  at jrun.servlet.jrpp.JRunProxyService.invokeRunnable(JRunProxyService.java:203)
                                  at jrunx.scheduler.ThreadPool$DownstreamMetrics.invokeRunnable(ThreadPool.java:320)
                                  at jrunx.scheduler.ThreadPool$ThreadThrottle.invokeRunnable(ThreadPool.java:428)
                                  at jrunx.scheduler.ThreadPool$UpstreamMetrics.invokeRunnable(ThreadPool.java:266)
                                  at jrunx.scheduler.WorkerThread.run(WorkerThread.java:66)

                               

                              There is no line about a memory issue!

                               

                              These errors happening on random request not ALL!! (arround 0.05%), and there is no real link between load and this... Our applications handle a lot of POST download, but like I've described before, error can occured on classic cfm page and static ressource...

                               

                              What I mean with upgrade CF8 to CF9, is with our 10 other servers running on CF8, I've never see that 503 issue outside a real jrun problem (like a memory crash, restart, etc...)

                               

                               

                              @Carl:

                               

                              Server is running on WIN2003R2SP2 32 bits OS (dual Xeon E5506, 4Go RAM). Here is the jvm.config arguments:

                               

                              # Arguments to VM
                              java.args=-server  -Xms1024m -Xmx1024m -Dsun.io.useCanonCaches=false -XX:PermSize=256m -XX:MaxPermSize=512m -XX:+UseParallelGC -Dcoldfusion.rootDir={application.home}/../ -Dcoldfusion.libPath={application.home}/../lib -Dcoldfusion.classPath={application.home}/../lib/updates,{application.home}/../lib,{appli cation.home}/../gateway/lib/,{application.home}/../wwwroot/WEB-INF/flex/jars,{application. home}/../wwwroot/WEB-INF/cfform/jars

                               

                              Do you know if it should be possible to try old JRun connector used by CF8 ?

                              • 12. Re: Bad performance on CF9 after upgrade from CF8
                                Steven Erat CommunityMVP

                                A couple points based on latest discussion:

                                 

                                1) Permanent Generation size is not a problem unless you start seeing

                                "java.lang.OutOfMemoryError PermGen Space".

                                 

                                2) People keep referring to memory errors because the error message suggests

                                it as a possibility, "JRun too busy or out of memory".  But that's just the

                                connector taking a guess as to why JRun server isn't responding to the

                                connector.

                                 

                                3) You should not use a CF8 connector for a CF9 server because the connector

                                is versioned for JRun builds, so the hooks or communication is tightly

                                coupled between them.  Attempting to do that will likely cause additional

                                problems.

                                 

                                In the past I demonstrated how the 503 status code with the "JRun too busy

                                or out of memory" message can be caused by JRun requests timing out in the

                                queue before getting to run.  But based on the JRun metrics log you provided

                                it did not appear to be the problem because requests were not queuing.  I

                                still think you need a more comprehensive analysis to include settings

                                review, log review, and snapshot/thread dump analysis.

                                • 13. Re: Bad performance on CF9 after upgrade from CF8
                                  Steven Erat CommunityMVP

                                  Continuing to research this, I ran across this thread which discusses how a

                                  503 status code might be returned when a user is performing a file upload

                                  and the user cancels the request.  More details at:

                                  http://www.houseoffusion.com/groups/cf-talk/thread.cfm/threadid:38705

                                   

                                  Perhaps the associated static content 503 errors are occur from the same

                                  user, say there was a form to upload where the action page had static

                                  content like css, jpg, js, etc, and the form upload cancellation caused all

                                  resource requests for the action page to return a 503.  Just a hunch.

                                  • 14. Re: Bad performance on CF9 after upgrade from CF8
                                    charlie@carehart.org CommunityMVP

                                    Erloz, are you saying you just eyeballed the logs (to find any outofmemory errors) or that you searched the logs? You really must do the latter, if you want to be absolutely sure it's not an issue. You could miss them on one line, especially with all that other stuff (which in my experience is more from the result of other problems, not the cause of them). Again, I address how best to do that search in my blog entry.

                                     

                                    Steven's idea about stopped uploads is certainly interesting, too, if there's some limit to when they happen for you.

                                     

                                    Finally, have you searched your web server logs to find the requests getting 503 errors? You may find a patter, whether in the domain, the filename, the IP address, the request method, the time, etc.

                                     

                                    /charlie

                                    • 15. Re: Bad performance on CF9 after upgrade from CF8
                                      erloz Community Member

                                      @Charlie:

                                       

                                      Yes I've made a big "grep" on my logfile, and no result about memory...

                                       

                                      I've just discovered a very strange behaviour:

                                       

                                      I've created a page which generate an oufofmemory:

                                       

                                      bigoutput.cfm:

                                       

                                      <cffile action="read" variable="toto" file="C:\bigfile.content">

                                      <cfoutput>#toto#</cfoutput>

                                       

                                      This generate a 500 error. OK.

                                       

                                      The next call to a CFM ressource (or something map to servlet) is everytime a 503 !!!!!

                                       

                                      Example:

                                       

                                      erlo@bacon:~/debugcf$ wget http://server/bigoutput.cfm
                                      --2011-01-05 18:02:56--  http://server/bigoutput.cfm
                                      Resolving server... XX.XX.XX.XX
                                      Connecting to server|XX.XX.XX.XX|:80... connected.
                                      HTTP request sent, awaiting response... 500 Internal Server Error
                                      2011-01-05 18:03:00 ERROR 500: Internal Server Error.

                                      erlo@bacon:~/debugcf$ wget http://server/calltoapagewhichdoesnotexist.cfm
                                      --2011-01-05 18:03:47--  http://server/calltoapagewhichdoesnotexist.cfm
                                      Resolving server... XX.XX.XX.XX
                                      Connecting to serverXX.XX.XX.XX|:80... connected.
                                      HTTP request sent, awaiting response... 503 Server Error
                                      2011-01-05 18:03:47 ERROR 503: Server Error.

                                      In my example, I call a ressource which does not exist (but is map to a servlet) arround 50 seconds after the previous error. I can reproduce this everytime.. It is the same behaviour with an existing cfm ressource, but not with a static ressource.

                                       

                                      This seems to be a bug, and does not explain the 503 on static content, and for the moment I cannot find a 500 just before my list of 503 error on production environnement.

                                       

                                      edit:

                                       

                                      I've tried on CF8 server, and I cannot reproduce this behaviour.


                                      A think it is really "connector related": if I stop coldfusion after my 500 error, I can have the same immediate 503 error..... the timeout produced by stopping the service is visible only on the 2nd request..... ARGH

                                      • 16. Re: Bad performance on CF9 after upgrade from CF8
                                        Steven Erat CommunityMVP

                                        If you monitor the total heap size (used/allocated/max) and watch for

                                        garbage collection, it might be that when the server hits an

                                        outofmemoryerror it will continue to not respond to the connector until GC

                                        kicks in and frees some memory up.  I suspect that GC runs and completes

                                        just before the server starts processing requests again.  HTTP 503 status

                                        code is kind of a catch all for server not available, which would be the

                                        case after an OOME for a brief period.

                                        • 17. Re: Bad performance on CF9 after upgrade from CF8
                                          charlie@carehart.org CommunityMVP

                                          That's an interesting observation, but then I would ask: are you seeing the outofmemory error in the runtime log when you run the test? If so, then if your theory was correct, you'd expect to have found other outofmemory errors from the preceding times this was happening.

                                           

                                          So if you don't mind, while you may want to focus on the discovery, I really think it will help you (and others here) diagnostically if you would check your grep. You should certainly be finding the oom from your test, right? If not, maybe you had a problem with how you were grepping/searching for the errors. Try it again. Again, just trying to help, not making any accusations.

                                           

                                          /charlie

                                          • 18. Re: Bad performance on CF9 after upgrade from CF8
                                            carl type3 Community Member

                                            ARGH indeed,

                                             

                                            JVM args look OK. There is one thing I like to mention with the Xms and Xmx values set to the same. Many times this is fine to do that is set the initial heap to the same as maximum heap. With that setting sometimes I have seen issues where the heap, which is made up of NewGen and OldGen, make poor decisions for sizing NewGen. The problem occurs where NewGen which has Eden and two Survivor Spaces sets itself to a large size, OldGen is occupied enough, a minor Garbage Collection occurs and objects that are going to tenure (move from NewGen survivor space to OldGen) can not fit in the space as the tenure survivor space objects are more than what is free in OldGen.  To overcome that happening make a decision for the JVM and set a size for NewGen with either a ratio  or Xmn switch. To know if such is occurring you would need to enable JVM logging. Caveat - I do not think you are having a JVM issue with the server and JVM args in use, so I do not like to waste your time and add to frustration if it is the wrong path to take to rectify the matter.

                                             

                                            Interesting observation re works ok on CF8 but errors on CF9. You are at least CF9 update1?

                                             

                                            HTH, Carl.

                                            • 19. Re: Bad performance on CF9 after upgrade from CF8
                                              Steven Erat CommunityMVP

                                              This blog entry shows how to configure the sizing of the New Generation's

                                              Eden and Survivor spaces:

                                               

                                              http://www.webapper.com/blog/index.php/2006/06/08/20060606021131/

                                               

                                              With these you can set the min and max heap the same while not worrying

                                              about the generation size issues. Setting the min/max heap the same is known

                                              to improve performance as the primary benefit.

                                              • 20. Re: Bad performance on CF9 after upgrade from CF8
                                                carl type3 Community Member

                                                Hello,

                                                 

                                                One other  thing with JVM. On 32 bit there is a maximum at which point Jrun.exe (ie ColdFusion) will stop responding. With Max heap 1024 (Xmx1024) and MaxPermSize 512 perhaps both committed (however not necessarily holding objects) then your going to run over the 32 bit boundary as there are other buffers holding memory eg Code Cache, and some DLL’s.

                                                 

                                                Here is reference from Oracle as to limit:
                                                http://www.oracle.com/technetwork/java/hotspotfaq-138619.html#gc_heap_32bit

                                                 


                                                Using TASKMAN what memory is Jrun.exe using when you get the problem?

                                                 

                                                HTH, Carl.

                                                • 21. Re: Bad performance on CF9 after upgrade from CF8
                                                  erloz Community Member

                                                  @Charlie:

                                                   

                                                  Yes, when I voluntary generate the 500 error, I can see the OOM info in my log:

                                                   

                                                  06/01 11:28:57 error ROOT CAUSE:
                                                  java.lang.OutOfMemoryError: Java heap space

                                                   

                                                  It is more difficult to simulate, but I think I can have the same behaviour by stopping a download/upload. I will try this case as soon as possible.

                                                   

                                                  I've notice other strange things:

                                                   

                                                  • When a enable "verbose" in connector setting, I cannot reproduce the "503" error
                                                  • I can reproduce the error on every CF9 install (the connector/jrun version is always the same: 108795)
                                                    • Win2003/II6 with CF9/CF9.0.1 Entreprise/Standard
                                                    • Win2008/II7 with CF9/CF9.0.1 Standard (not tested with Entreprise)

                                                   

                                                  Other element:

                                                   

                                                  After the 500 error: JRun want to close the link with connector, but the connector is staying in "CLOSE_WAIT" status (that means the connector does not destroy the socket) while the JRUN is wainting for the "FIN" packet from Connector (FIN_WAIT2)...

                                                   

                                                  If I manualy kill the socket (with tcpview on windows for example), the behaviour is normal, the request after is not a 503.

                                                  If I do not kill the socket, the next request is everytime a 503 (for mapped ressource), and the connection to JRun is re-initialize.

                                                   

                                                  On CF8, after the same 500 error, the link between the connector and JRun is always ESTABLISHED.

                                                   

                                                  Is anyone can try to reproduce the case ?

                                                   

                                                  Is this issue could come from the "new" pool system of the connector ?

                                                  • 22. Re: Bad performance on CF9 after upgrade from CF8
                                                    charlie@carehart.org CommunityMVP

                                                    Good detective work, Erloz. As for others reproducing it, you may find that more difficult to expect from folks here, from my experience. Not many have enough understanding of the connector to really help, or may not have the time/motivation to setup and analyze test cases (myself included). But we'll see if any step up.

                                                     

                                                    Really, though, you have enough info and detailed observation now to approach Adobe support. Sure, you have to pay but if they confirm it a bug, your money is refunded. That said, it could end up that you instead have some config error, somewhere (CF, Jrun, web server, connector, OS). Since your problem doesn't seem something that others are experiencing widely, you have to consider that possibility.

                                                     

                                                    But if you want resolution, even at the risk of it not being a free call, it seems that contacting Adobe may now be the best next step, unless someone else dives in here. Sadly, you can't really expect that someone from Adobe will see this thread. These are not official support forums.

                                                     

                                                    But certainly some will be interested to hear whatever resolution you find, if you'll be so kind to share it even if you have to pay (since we all contribute what time we do here, trying to help, purely on a volunteer basis.)

                                                     

                                                    One last question, though: you refer to a "'new' pool system of the connector'. What's that in reference to? I just don't recall someone asserting that, nor have I heard of some change in 9.

                                                     

                                                    /charlie

                                                    • 23. Re: Bad performance on CF9 after upgrade from CF8
                                                      erloz Community Member

                                                      First, thanks to people who have spent his time on my case.

                                                       

                                                      About the connector of CF9, I've read informations at the bottom of the PDF: http://www.adobe.com/products/coldfusion/pdfs/cf9_performancebrief_ue.pdf

                                                       

                                                      Here is a quote of this document:

                                                      IIS maintains the worker thread pool size of 256 per CPU. (Some threads are used internally by IIS). When a request arrives, one thread is picked from the IIS worker thread pool and used until the request is completed. When a request arrives for the IIS thread, the request is blocked until ColdFusion serves the CFM page. Meanwhile, if a request comes in for other content types like HTTP or images, IIS uses other threads in the
                                                      pool to serve them. If all IIS worker threads are being blocked by ColdFusion requests, there will be no threads immediately available for IIS to serve pages of other content types.
                                                      When a connector receives a request in ColdFusion 9, it picks the native thread from its thread pool and returns the IIS worker thread with IO Completion (the request will be in IO block state). Once the IIS request is freed from the request, it is immediately available to handle the next request and to serve pages of other content types. Once the request is completed by ColdFusion, the connector thread is returned to its native thread pool.
                                                      Users can tune their applications using different native connector thread pool size values. By default, the native connector thread pool size is 25. This thread pool is defined as “maxworkerthread” in file [coldfusion_home]\runtime\lib\wsconfig\1\jrun_iis6_wildcard.ini.

                                                       

                                                      If I've a solution for this problem, I will come back to explain it here.

                                                      • 24. Re: Bad performance on CF9 after upgrade from CF8
                                                        charlie@carehart.org CommunityMVP

                                                        Sure, but Erloz, are you saying that that's what you thought was something "new"? It's not, as far as I know. That's just the perf brief for CF9. It's not all necessarily that the things being discussed were necessarily all "new". Again, just trying to help. Will look forward to hearing more that you may learn.

                                                         

                                                        /charlie

                                                        • 25. Re: Bad performance on CF9 after upgrade from CF8
                                                          carl type3 Community Member

                                                          +1 to know outcome / solution. Sorry I got not much more to add.

                                                          Regards, Carl.

                                                          • 26. Re: Bad performance on CF9 after upgrade from CF8
                                                            fukuchang0203

                                                            The graphics might still be a problem with CF/JRun, as the default context root means that the 高血圧  糖尿病 JRun connector receives every URL before deciding whether to process it with CF or return it to IIS. IIS itself isn't going to generate a 5xx error for static content.

                                                            • 27. Re: Bad performance on CF9 after upgrade from CF8
                                                              erloz Community Member

                                                              Hello,

                                                               

                                                              Just to inform that the bug I talk about has been verified by adobe coldfusion team (9 months after...), and is targeted for ColdFusion 10.0, Beta 1...

                                                              So, if you have discovered the same behaviour, be patient...

                                                               

                                                              Link to the bug tracker: http://cfbugs.adobe.com/cfbugreport/flexbugui/cfbugtracker/main.html#bugId=85523

                                                              • 28. Re: Bad performance on CF9 after upgrade from CF8
                                                                Joop Kaashoek Community Member

                                                                I did not read the entire thread but I recall that the oReilly servlet utilities for uploading files/multiparts etc  cause problem with CF9. We used them before as well but replaced them with the Apache commons fileupload utility, and that is working well.

                                                                 

                                                                As in our case, it could be that JRun gets crippled by the application code exceptions you are getting, it is a delicate server..

                                                                 

                                                                Let me know if you want code samplesfor the Apache FileUpload utility