Skip navigation
Currently Being Moderated

Bad performance on CF9 after upgrade from CF8

Jan 3, 2011 3:10 AM

Hello and happy new year for everybody 

 

I'm stuck with a problem after upgrade from CF8 entreprise to CF9 entreprise.

We are using IIS6 on WIN2003R2 (SP2).

 

There are a lot of  "503 Server Error" with no content. (about 0.05% of request). This can be occured on big or small CFM, graphic ressources, etc...

For each error I can see the following error on the connector logs (jrun_iis6_wildcardYYMMDD.log).

2011-01-02 19:00:05 jrISAPI[1456:3940]  returning error page for JRun too busy or out of memory

 

I've never seen this message in previous CF connector version.

 

And on the jrun log (coldfusion-outXXX.log), I read the following errors:

02/01 20:53:31 error unexpected end of part
java.io.IOException: unexpected end of part
    at com.oreilly.servlet.multipart.PartInputStream.fill(PartInputStream.ja va:96)
    at com.oreilly.servlet.multipart.PartInputStream.read(PartInputStream.ja va:191)
    at com.oreilly.servlet.multipart.PartInputStream.read(PartInputStream.ja va:152)
    at com.oreilly.servlet.multipart.FilePart.write(FilePart.java:257)
    at com.oreilly.servlet.multipart.FilePart.writeTo(FilePart.java:215)
    at coldfusion.filter.FormScope.fillForm(FormScope.java:252)
    at coldfusion.filter.FusionContext.SymTab_initForRequest(FusionContext.j ava:376)
    at coldfusion.filter.GlobalsFilter.invoke(GlobalsFilter.java:33)
    at coldfusion.filter.DatasourceFilter.invoke(DatasourceFilter.java:22)
    at coldfusion.filter.CachingFilter.invoke(CachingFilter.java:53)
    at coldfusion.filter.RequestThrottleFilter.invoke(RequestThrottleFilter. java:126)
    at coldfusion.CfmServlet.service(CfmServlet.java:200)
    at coldfusion.bootstrap.BootstrapServlet.service(BootstrapServlet.java:8 9)
    at jrun.servlet.FilterChain.doFilter(FilterChain.java:86)
    at coldfusion.monitor.event.MonitoringServletFilter.doFilter(MonitoringS ervletFilter.java:42)
    at coldfusion.bootstrap.BootstrapFilter.doFilter(BootstrapFilter.java:46 )
    at jrun.servlet.FilterChain.doFilter(FilterChain.java:94)
    at jrun.servlet.FilterChain.service(FilterChain.java:101)
    at jrun.servlet.ServletInvoker.invoke(ServletInvoker.java:106)
    at jrun.servlet.JRunInvokerChain.invokeNext(JRunInvokerChain.java:42)
    at jrun.servlet.JRunRequestDispatcher.invoke(JRunRequestDispatcher.java: 286)
    at jrun.servlet.ServletEngineService.dispatch(ServletEngineService.java: 543)
    at jrun.servlet.jrpp.JRunProxyService.invokeRunnable(JRunProxyService.ja va:203)
    at jrunx.scheduler.ThreadPool$DownstreamMetrics.invokeRunnable(ThreadPoo l.java:320)
    at jrunx.scheduler.ThreadPool$ThreadThrottle.invokeRunnable(ThreadPool.j ava:428)
    at jrunx.scheduler.ThreadPool$UpstreamMetrics.invokeRunnable(ThreadPool. java:266)
    at jrunx.scheduler.WorkerThread.run(WorkerThread.java:66)

02/01 20:53:31 error (JRun Service: ProxyService [jrun.servlet.jrpp.JRunProxyService@4006ca]) JRunPRoxyServer.invokeRunnable:
java.lang.IllegalStateException
    at jrun.servlet.JRunResponse.getWriter(JRunResponse.java:205)
    at jrun.servlet.JRunResponse.sendError(JRunResponse.java:597)
    at jrun.servlet.JRunRequestDispatcher.invoke(JRunRequestDispatcher.java: 328)
    at jrun.servlet.ServletEngineService.dispatch(ServletEngineService.java: 543)
    at jrun.servlet.jrpp.JRunProxyService.invokeRunnable(JRunProxyService.ja va:203)
    at jrunx.scheduler.ThreadPool$DownstreamMetrics.invokeRunnable(ThreadPoo l.java:320)
    at jrunx.scheduler.ThreadPool$ThreadThrottle.invokeRunnable(ThreadPool.j ava:428)
    at jrunx.scheduler.ThreadPool$UpstreamMetrics.invokeRunnable(ThreadPool. java:266)
    at jrunx.scheduler.WorkerThread.run(WorkerThread.java:66)

java.lang.IllegalStateException
    at jrun.servlet.JRunResponse.getWriter(JRunResponse.java:205)
    at jrun.servlet.JRunResponse.sendError(JRunResponse.java:597)
    at jrun.servlet.JRunRequestDispatcher.invoke(JRunRequestDispatcher.java: 328)
    at jrun.servlet.ServletEngineService.dispatch(ServletEngineService.java: 543)
    at jrun.servlet.jrpp.JRunProxyService.invokeRunnable(JRunProxyService.ja va:203)
    at jrunx.scheduler.ThreadPool$DownstreamMetrics.invokeRunnable(ThreadPoo l.java:320)
    at jrunx.scheduler.ThreadPool$ThreadThrottle.invokeRunnable(ThreadPool.j ava:428)
    at jrunx.scheduler.ThreadPool$UpstreamMetrics.invokeRunnable(ThreadPool. java:266)
    at jrunx.scheduler.WorkerThread.run(WorkerThread.java:66)

 

 

Connector settings (jrun_iis6_wildcard.ini) :

 

  • verbose=false
  • Buffer=true (I've tried with false)
  • serverstore=C:/ColdFusion9/runtime/lib/wsconfig/1/jrunserver.store
  • bootstrap=127.0.0.1:51800
  • apialloc=false
  • maxworkerthreads=100 (I've tried to increase/decrease this value)

 

JRUN settings:

 

  • Threads number: 150
  • Max queued jrun threads: 150

 

Is anyone have experience the same issue ? How can I try to solve this ?

 

Big thanks!

 
Replies
  • Currently Being Moderated
    Jan 3, 2011 9:11 AM   in reply to erloz

    Right off the bat, I suspect that setting the JRun thread pool size to 150

    is probably hurting you during periods of higher traffic.  See my blog entry

    for more:

    http://www.talkingtree.com/blog/index.cfm/2009/8/3/ColdFusion-Request- Tuning-Settings-in-Depth

     

    Also, you mention that the problem occurs on CFM files and graphic

    resources.  This is an important distinction.  One is dynamic content, the

    other is static content.  If you are experiencing an increase in 503 errors

    for static content (html, gif, jpg, css, js) as verified in the IIS logs

    then you may have a problem that is not related to ColdFusion.

     
    |
    Mark as:
  • Dave Watts
    747 posts
    Mar 11, 2003
    Currently Being Moderated
    Jan 3, 2011 9:22 AM   in reply to Steven Erat

    Hi, Steven!

     

    The graphics might still be a problem with CF/JRun, as the default context root means that the JRun connector receives every URL before deciding whether to process it with CF or return it to IIS. IIS itself isn't going to generate a 5xx error for static content.

     

    Dave Watts, CTO, Fig Leaf Software

    http://www.figleaf.com/

    http://training.figleaf.com/

     
    |
    Mark as:
  • Currently Being Moderated
    Jan 3, 2011 10:18 AM   in reply to Dave Watts

    Hi Dave,

     

    Its been a few years since I dug into connector filtering (big caveat as

    this is a bit of a meander and some of this is thinking out loud to refresh

    myself), but as I recall the connector property ignoresuffixmap has a

    default of true, and the result of which should be to not send all URL

    patterns to JRun for processing, but rather to let IIS determine if a

    request URL should be handled by IIS, by JRun, or by something else.

     

    I believe the sequence of events is that when the connector starts up it

    gathers a list of servlet mappings from JRun for things like *.cfm, or

    /flashremoting/, etc and that when ignoresuffixmap is true a request for

    /foo.cfm would show in the verbose connector logs as "filtering ignored

    suffix .cfm" followed by "filterRequest no match", then control would return

    to IIS for further extension mapping.  If ignoresuffixmap were changed to

    false, then the connector would begin attempting to pattern match the URL to

    known servlet mappings for the JRun server and then send them to JRun

    immediately for processing without letting IIS inspect the URL further.

     

    If there are no servlet mappings for static content types, and IIS is not

    configured to send static content types to CF (as some people do "hide" the

    app server type by using .html extensions instead of .cfm), then no matter

    the value of ignoresuffixmap, the connector wouldn't send the URL to be

    processed by JRun.  That means that if JRun were down/busy/unavailable to

    process requests, it wouldn't affect requests for static content types of

    html, css, js, jpg, etc because the connector would never send the request

    over to JRun.

     

    I suppose, however, that if there were some kind of problem with the

    connector's ability to perform the filtering (including filtering to ignore

    suffixes), then the connector could (possibly?) be the culprit returning 503

    status codes back to IIS even though the actual JRun server itself was never

    brought into the loop.  ... Finally, this is the "Ah ha!" moment when I

    realize that you are correct.

     

    So in short: What Dave said.  It could be the connector.  You might want to

    enable connector logging as the next step in finding out to what's going on.

     
    |
    Mark as:
  • Currently Being Moderated
    Jan 4, 2011 7:57 AM   in reply to erloz


    For now, I'd start with the basics. 

     

    - Tuning the request pools (JRun active handler request pool, then secondarily the 4 CF request pools cfm/cfc/ws/fr).  I still suspect that having the JRun running request pool (maxactivehandlers) set to 150 will a problem at some point.  During any heavy traffic, the JVM could work the CPUs too hard and unnecessarily jack up memory requirements until the server goes into a thrashing state.  I could be wrong since I don't know your hardware or your application, but my experience with helping others tune servers has shown that lower numbers produce better throughput and reduce/eliminate the thrashing of resources.

     

    - Analyze the JRun logs for "Java heap space", OutOfMemoryError, and watch the JRun/bin directory for the presence of Hotspot crash logs (logs starting with "hs..."  Can you trace down any of these at all?  There's a whole line of questioning to follow up on this if found.

     

    - Can you enable JRun metrics logging? This will track active/busy request count as well as memory used/free.  It would be good to know if there is a relationship between high values for reqs or memory with the start of the 503 errors.

     

    Also, when exactly do the 503 errors start?  Would you say it happens right after CF/IIS restart, or only after a surge in traffic, or seemingly random?  Can you correlate the frequency of the errors with any other events like maybe some scheduled tasks or ... ?

     

    Regarding the connector, I found one possibly related thread about how a change to web.xml to add a new extension for CF files had caused 503s.  See comments by "Cheve"at

    http://www.simonwhatley.co.uk/poor-mans-http-compression-with-coldfusi on

     

    Also found a CF bug where a developer was purposefully creating a memory leak to test a scenario, and it resulted in the 503 error, correlated with the Java Heap Space error message: http://www.elliottsprehn.com/cfbugs/bugs/78721#stack-trace   The bug is marked closed, but with no comments by Adobe about its status or any explanation.   I just re-read your original post, and I see now that your CF9 connector logs were mentioning a possible memory situation "JRun too busy or out of memory" and that CF8 did not.  So that might be more reason to seek down evidence of memory issues in the logs and try to trace it back to the app.  I also noticed that your original post had stack traces that also included IllegalStateException errors at the same time, and that bug report has a similar pattern of showing that exception right around the time of the memory error.  In your case, preceding that is the com.oreilly stack trace refering to unexpected end of part.  That com.oreilly library is responsible for handling file uploads, so you may want to purse that angle as well.

     

    Hope this helps

     
    |
    Mark as:
  • Currently Being Moderated
    Jan 4, 2011 10:43 AM   in reply to erloz

    I believe that handledRq refers to how many requests were handled or

    processed to completion during the last metric logging interval (1 minute).

     

    The metrics logging shows that memory is fine (free remains high), and that

    you have about 10-14 active requests at the moment of logging, with a high

    water mark of 42 handler threads with 5 or 6 that have been inactive long

    enough to be considered idle.  The requests handled per minute has a 50%+

    increase between 18:24 and 18:27, a minor surge, but still none shown for

    delay.

     

    Seven years ago I once blogged about how if JRun queues requests and those

    queued requests timeout there without ever running that JRun will return a

    503 status code with the error message "returning error page for JRun too

    busy or out of memory".  However, it doesn't appear to be the case here

    since your JRun metrics shows 0 for delay.

     

    The information you provided so far has been helpful, but at this point

    there is still not enough for me to make a clear conclusion.  As the next

    step I'd take several "snapshots" from the Server Monitor while the 503

    errors are occurring (snapshots are triggered via a web request, so

    generating a thread dump directly on the OS to bypass the web may be a

    better option).  The snapshots/threaddumps will show you exactly what

    threads are running, how many, what they are doing down to the line of code

    (java source code, not just cfml code).

     

    Also, you may also want to contact a ColdFusion consultant to help you

    troubleshoot this for you further.  I recommend FigLeaf.com or go2ria.com.

     
    |
    Mark as:
  • Currently Being Moderated
    Jan 4, 2011 6:00 PM   in reply to erloz

    Although I'm certainly willing to continue trying to help out here with the

    problem under discussion, this form of help might be to slow for you. So

    once again I'll suggest that if this is an urgent business need, then it's

    really in your best interest to hire a dedicated and experienced consultant

    to help resolve this.

     

    In addition to the consultants I mentioned earlier, I'll point out that

    Charlie Arehart maintains a very helpful resource center, CF411.com,

    intended to help people in just this situation.  One of the resources is a

    larger list of CF consulting companies.  Charlie himself is known to be

    among the very best at diving into all manner of ColdFusion performance

    issues.

     

    See this for more:

     

    CF-oriented Troubleshooting Consultant

    http://www.cf411.com/#cfassist

     
    |
    Mark as:
  • Currently Being Moderated
    Jan 4, 2011 7:20 PM   in reply to Steven Erat

    Thanks for that, Steven. I've stayed quiet in this thread because so many others had jumped in and were offering useful info.

     

    I'll add this, though: erloz, you refer to errors you're seeing in the web server connector logs. What are you seeing instead in the Re: Bad performance on CF9 after upgrade from CF8\runtime\logs (or Re: Bad performance on CF9 after upgrade from CF8\logs, if on Multiserver)? I know you've seen errors related to non-CF files, but given the wildcard filter in IIS and how the connector does seem to at least look at non-CF files briefly, what you may see in the connector logs may be more of the result of a more serious error having happened first in processing CF pages that may make it non-responsive to the connector. That's a bit of a wild guess, but it's worth checking. In particular, look in those logs for any "outofmemory" errors (search for that specific string).

     

    I have a blog entry that helps address some of what you may see if you find them:

     

    http://www.carehart.org/blog/client/index.cfm/2010/11/3/when_memory_pr oblems_arent_what_they_seem_part_1

     

    Finally, you don't make it clear: are these errors happening on ALL requests, all the time? You referred to it as part of something happening on an upgrade from CF 8 to 9. Has this been since the upgrade? Constant? Periodic? Any additional info may help.

     

    /charlie

     
    |
    Mark as:
  • Currently Being Moderated
    Jan 5, 2011 1:30 AM   in reply to erloz

    Hello,

     

    Looking thru the thread you do not say what is the bit 32 or 64? I mention because on 64 bit I have noticed the default MaxPermSize inadequate, which leads to many major Garbage Collections - part of major GC being stop all Java process hence bad for performance.

     

    With the Java messages in the OUT log I am curious as to the JVM arguments in CF\runtime\bin\JVM.CONFIG?

     

    I am not entirely sure CF JVM is the cause of problem however I would not discount it particularly since the jrun_iis6_wildcardYYMMDD.log - returning error page for JRun too busy or out of memory (tho I not seen that one before myself).

     

    HTH, Carl.
     
    |
    Mark as:
  • Currently Being Moderated
    Jan 5, 2011 4:38 AM   in reply to erloz

    A couple points based on latest discussion:

     

    1) Permanent Generation size is not a problem unless you start seeing

    "java.lang.OutOfMemoryError PermGen Space".

     

    2) People keep referring to memory errors because the error message suggests

    it as a possibility, "JRun too busy or out of memory".  But that's just the

    connector taking a guess as to why JRun server isn't responding to the

    connector.

     

    3) You should not use a CF8 connector for a CF9 server because the connector

    is versioned for JRun builds, so the hooks or communication is tightly

    coupled between them.  Attempting to do that will likely cause additional

    problems.

     

    In the past I demonstrated how the 503 status code with the "JRun too busy

    or out of memory" message can be caused by JRun requests timing out in the

    queue before getting to run.  But based on the JRun metrics log you provided

    it did not appear to be the problem because requests were not queuing.  I

    still think you need a more comprehensive analysis to include settings

    review, log review, and snapshot/thread dump analysis.

     
    |
    Mark as:
  • Currently Being Moderated
    Jan 5, 2011 6:44 AM   in reply to erloz

    Continuing to research this, I ran across this thread which discusses how a

    503 status code might be returned when a user is performing a file upload

    and the user cancels the request.  More details at:

    http://www.houseoffusion.com/groups/cf-talk/thread.cfm/threadid:38705

     

    Perhaps the associated static content 503 errors are occur from the same

    user, say there was a form to upload where the action page had static

    content like css, jpg, js, etc, and the form upload cancellation caused all

    resource requests for the action page to return a 503.  Just a hunch.

     
    |
    Mark as:
  • Currently Being Moderated
    Jan 5, 2011 8:46 AM   in reply to erloz

    Erloz, are you saying you just eyeballed the logs (to find any outofmemory errors) or that you searched the logs? You really must do the latter, if you want to be absolutely sure it's not an issue. You could miss them on one line, especially with all that other stuff (which in my experience is more from the result of other problems, not the cause of them). Again, I address how best to do that search in my blog entry.

     

    Steven's idea about stopped uploads is certainly interesting, too, if there's some limit to when they happen for you.

     

    Finally, have you searched your web server logs to find the requests getting 503 errors? You may find a patter, whether in the domain, the filename, the IP address, the request method, the time, etc.

     

    /charlie

     
    |
    Mark as:
  • Currently Being Moderated
    Jan 5, 2011 9:24 AM   in reply to erloz

    If you monitor the total heap size (used/allocated/max) and watch for

    garbage collection, it might be that when the server hits an

    outofmemoryerror it will continue to not respond to the connector until GC

    kicks in and frees some memory up.  I suspect that GC runs and completes

    just before the server starts processing requests again.  HTTP 503 status

    code is kind of a catch all for server not available, which would be the

    case after an OOME for a brief period.

     
    |
    Mark as:
  • Currently Being Moderated
    Jan 5, 2011 9:46 AM   in reply to erloz

    That's an interesting observation, but then I would ask: are you seeing the outofmemory error in the runtime log when you run the test? If so, then if your theory was correct, you'd expect to have found other outofmemory errors from the preceding times this was happening.

     

    So if you don't mind, while you may want to focus on the discovery, I really think it will help you (and others here) diagnostically if you would check your grep. You should certainly be finding the oom from your test, right? If not, maybe you had a problem with how you were grepping/searching for the errors. Try it again. Again, just trying to help, not making any accusations.

     

    /charlie

     
    |
    Mark as:
  • Currently Being Moderated
    Jan 5, 2011 2:59 PM   in reply to erloz

    ARGH indeed,

     

    JVM args look OK. There is one thing I like to mention with the Xms and Xmx values set to the same. Many times this is fine to do that is set the initial heap to the same as maximum heap. With that setting sometimes I have seen issues where the heap, which is made up of NewGen and OldGen, make poor decisions for sizing NewGen. The problem occurs where NewGen which has Eden and two Survivor Spaces sets itself to a large size, OldGen is occupied enough, a minor Garbage Collection occurs and objects that are going to tenure (move from NewGen survivor space to OldGen) can not fit in the space as the tenure survivor space objects are more than what is free in OldGen.  To overcome that happening make a decision for the JVM and set a size for NewGen with either a ratio  or Xmn switch. To know if such is occurring you would need to enable JVM logging. Caveat - I do not think you are having a JVM issue with the server and JVM args in use, so I do not like to waste your time and add to frustration if it is the wrong path to take to rectify the matter.

     

    Interesting observation re works ok on CF8 but errors on CF9. You are at least CF9 update1?

     

    HTH, Carl.

     
    |
    Mark as:
  • Currently Being Moderated
    Jan 5, 2011 3:54 PM   in reply to carl type3

    This blog entry shows how to configure the sizing of the New Generation's

    Eden and Survivor spaces:

     

    http://www.webapper.com/blog/index.php/2006/06/08/20060606021131/

     

    With these you can set the min and max heap the same while not worrying

    about the generation size issues. Setting the min/max heap the same is known

    to improve performance as the primary benefit.

     
    |
    Mark as:
  • Currently Being Moderated
    Jan 5, 2011 5:04 PM   in reply to carl type3

    Hello,

     

    One other  thing with JVM. On 32 bit there is a maximum at which point Jrun.exe (ie ColdFusion) will stop responding. With Max heap 1024 (Xmx1024) and MaxPermSize 512 perhaps both committed (however not necessarily holding objects) then your going to run over the 32 bit boundary as there are other buffers holding memory eg Code Cache, and some DLL’s.

     

    Here is reference from Oracle as to limit:
    http://www.oracle.com/technetwork/java/hotspotfaq-138619.html#gc_heap_ 32bit

     


    Using TASKMAN what memory is Jrun.exe using when you get the problem?

     

    HTH, Carl.

     
    |
    Mark as:
  • Currently Being Moderated
    Jan 6, 2011 7:53 AM   in reply to erloz

    Good detective work, Erloz. As for others reproducing it, you may find that more difficult to expect from folks here, from my experience. Not many have enough understanding of the connector to really help, or may not have the time/motivation to setup and analyze test cases (myself included). But we'll see if any step up.

     

    Really, though, you have enough info and detailed observation now to approach Adobe support. Sure, you have to pay but if they confirm it a bug, your money is refunded. That said, it could end up that you instead have some config error, somewhere (CF, Jrun, web server, connector, OS). Since your problem doesn't seem something that others are experiencing widely, you have to consider that possibility.

     

    But if you want resolution, even at the risk of it not being a free call, it seems that contacting Adobe may now be the best next step, unless someone else dives in here. Sadly, you can't really expect that someone from Adobe will see this thread. These are not official support forums.

     

    But certainly some will be interested to hear whatever resolution you find, if you'll be so kind to share it even if you have to pay (since we all contribute what time we do here, trying to help, purely on a volunteer basis.)

     

    One last question, though: you refer to a "'new' pool system of the connector'. What's that in reference to? I just don't recall someone asserting that, nor have I heard of some change in 9.

     

    /charlie

     
    |
    Mark as:
  • Currently Being Moderated
    Jan 6, 2011 9:10 AM   in reply to erloz

    Sure, but Erloz, are you saying that that's what you thought was something "new"? It's not, as far as I know. That's just the perf brief for CF9. It's not all necessarily that the things being discussed were necessarily all "new". Again, just trying to help. Will look forward to hearing more that you may learn.

     

    /charlie

     
    |
    Mark as:
  • Currently Being Moderated
    Jan 6, 2011 3:42 PM   in reply to erloz

    +1 to know outcome / solution. Sorry I got not much more to add.

    Regards, Carl.

     
    |
    Mark as:
  • Currently Being Moderated
    Jan 19, 2011 8:07 AM   in reply to erloz

    The graphics might still be a problem with CF/JRun, as the default context root means that the 高血圧  糖尿病 JRun connector receives every URL before deciding whether to process it with CF or return it to IIS. IIS itself isn't going to generate a 5xx error for static content.

     
    |
    Mark as:
  • Currently Being Moderated
    Jun 26, 2012 8:16 AM   in reply to erloz

    I did not read the entire thread but I recall that the oReilly servlet utilities for uploading files/multiparts etc  cause problem with CF9. We used them before as well but replaced them with the Apache commons fileupload utility, and that is working well.

     

    As in our case, it could be that JRun gets crippled by the application code exceptions you are getting, it is a delicate server..

     

    Let me know if you want code samplesfor the Apache FileUpload utility

     
    |
    Mark as:

More Like This

  • Retrieving data ...

Bookmarked By (0)

Answers + Points = Status

  • 10 points awarded for Correct Answers
  • 5 points awarded for Helpful Answers
  • 10,000+ points
  • 1,001-10,000 points
  • 501-1,000 points
  • 5-500 points