• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

Bad performance on CF9 after upgrade from CF8

Guest
Jan 03, 2011 Jan 03, 2011

Copy link to clipboard

Copied

Hello and happy new year for everybody 

I'm stuck with a problem after upgrade from CF8 entreprise to CF9 entreprise.

We are using IIS6 on WIN2003R2 (SP2).

There are a lot of  "503 Server Error" with no content. (about 0.05% of request). This can be occured on big or small CFM, graphic ressources, etc...

For each error I can see the following error on the connector logs (jrun_iis6_wildcardYYMMDD.log).

2011-01-02 19:00:05 jrISAPI[1456:3940]  returning error page for JRun too busy or out of memory

I've never seen this message in previous CF connector version.

And on the jrun log (coldfusion-outXXX.log), I read the following errors:

02/01 20:53:31 error unexpected end of part
java.io.IOException: unexpected end of part
    at com.oreilly.servlet.multipart.PartInputStream.fill(PartInputStream.java:96)
    at com.oreilly.servlet.multipart.PartInputStream.read(PartInputStream.java:191)
    at com.oreilly.servlet.multipart.PartInputStream.read(PartInputStream.java:152)
    at com.oreilly.servlet.multipart.FilePart.write(FilePart.java:257)
    at com.oreilly.servlet.multipart.FilePart.writeTo(FilePart.java:215)
    at coldfusion.filter.FormScope.fillForm(FormScope.java:252)
    at coldfusion.filter.FusionContext.SymTab_initForRequest(FusionContext.java:376)
    at coldfusion.filter.GlobalsFilter.invoke(GlobalsFilter.java:33)
    at coldfusion.filter.DatasourceFilter.invoke(DatasourceFilter.java:22)
    at coldfusion.filter.CachingFilter.invoke(CachingFilter.java:53)
    at coldfusion.filter.RequestThrottleFilter.invoke(RequestThrottleFilter.java:126)
    at coldfusion.CfmServlet.service(CfmServlet.java:200)
    at coldfusion.bootstrap.BootstrapServlet.service(BootstrapServlet.java:89)
    at jrun.servlet.FilterChain.doFilter(FilterChain.java:86)
    at coldfusion.monitor.event.MonitoringServletFilter.doFilter(MonitoringServletFilter.java:42)
    at coldfusion.bootstrap.BootstrapFilter.doFilter(BootstrapFilter.java:46)
    at jrun.servlet.FilterChain.doFilter(FilterChain.java:94)
    at jrun.servlet.FilterChain.service(FilterChain.java:101)
    at jrun.servlet.ServletInvoker.invoke(ServletInvoker.java:106)
    at jrun.servlet.JRunInvokerChain.invokeNext(JRunInvokerChain.java:42)
    at jrun.servlet.JRunRequestDispatcher.invoke(JRunRequestDispatcher.java:286)
    at jrun.servlet.ServletEngineService.dispatch(ServletEngineService.java:543)
    at jrun.servlet.jrpp.JRunProxyService.invokeRunnable(JRunProxyService.java:203)
    at jrunx.scheduler.ThreadPool$DownstreamMetrics.invokeRunnable(ThreadPool.java:320)
    at jrunx.scheduler.ThreadPool$ThreadThrottle.invokeRunnable(ThreadPool.java:428)
    at jrunx.scheduler.ThreadPool$UpstreamMetrics.invokeRunnable(ThreadPool.java:266)
    at jrunx.scheduler.WorkerThread.run(WorkerThread.java:66)

02/01 20:53:31 error (JRun Service: ProxyService [jrun.servlet.jrpp.JRunProxyService@4006ca]) JRunPRoxyServer.invokeRunnable:
java.lang.IllegalStateException
    at jrun.servlet.JRunResponse.getWriter(JRunResponse.java:205)
    at jrun.servlet.JRunResponse.sendError(JRunResponse.java:597)
    at jrun.servlet.JRunRequestDispatcher.invoke(JRunRequestDispatcher.java:328)
    at jrun.servlet.ServletEngineService.dispatch(ServletEngineService.java:543)
    at jrun.servlet.jrpp.JRunProxyService.invokeRunnable(JRunProxyService.java:203)
    at jrunx.scheduler.ThreadPool$DownstreamMetrics.invokeRunnable(ThreadPool.java:320)
    at jrunx.scheduler.ThreadPool$ThreadThrottle.invokeRunnable(ThreadPool.java:428)
    at jrunx.scheduler.ThreadPool$UpstreamMetrics.invokeRunnable(ThreadPool.java:266)
    at jrunx.scheduler.WorkerThread.run(WorkerThread.java:66)

java.lang.IllegalStateException
    at jrun.servlet.JRunResponse.getWriter(JRunResponse.java:205)
    at jrun.servlet.JRunResponse.sendError(JRunResponse.java:597)
    at jrun.servlet.JRunRequestDispatcher.invoke(JRunRequestDispatcher.java:328)
    at jrun.servlet.ServletEngineService.dispatch(ServletEngineService.java:543)
    at jrun.servlet.jrpp.JRunProxyService.invokeRunnable(JRunProxyService.java:203)
    at jrunx.scheduler.ThreadPool$DownstreamMetrics.invokeRunnable(ThreadPool.java:320)
    at jrunx.scheduler.ThreadPool$ThreadThrottle.invokeRunnable(ThreadPool.java:428)
    at jrunx.scheduler.ThreadPool$UpstreamMetrics.invokeRunnable(ThreadPool.java:266)
    at jrunx.scheduler.WorkerThread.run(WorkerThread.java:66)

Connector settings (jrun_iis6_wildcard.ini) :

  • verbose=false
  • Buffer=true (I've tried with false)
  • serverstore=C:/ColdFusion9/runtime/lib/wsconfig/1/jrunserver.store
  • bootstrap=127.0.0.1:51800
  • apialloc=false
  • maxworkerthreads=100 (I've tried to increase/decrease this value)

JRUN settings:

  • Threads number: 150
  • Max queued jrun threads: 150

Is anyone have experience the same issue ? How can I try to solve this ?

Big thanks!

Views

8.1K

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Engaged ,
Jan 03, 2011 Jan 03, 2011

Copy link to clipboard

Copied

Right off the bat, I suspect that setting the JRun thread pool size to 150

is probably hurting you during periods of higher traffic. See my blog entry

for more:

http://www.talkingtree.com/blog/index.cfm/2009/8/3/ColdFusion-Request-Tuning-Settings-in-Depth

Also, you mention that the problem occurs on CFM files and graphic

resources. This is an important distinction. One is dynamic content, the

other is static content. If you are experiencing an increase in 503 errors

for static content (html, gif, jpg, css, js) as verified in the IIS logs

then you may have a problem that is not related to ColdFusion.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jan 03, 2011 Jan 03, 2011

Copy link to clipboard

Copied

Hi, Steven!

The graphics might still be a problem with CF/JRun, as the default context root means that the JRun connector receives every URL before deciding whether to process it with CF or return it to IIS. IIS itself isn't going to generate a 5xx error for static content.

Dave Watts, CTO, Fig Leaf Software

http://www.figleaf.com/

http://training.figleaf.com/

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Engaged ,
Jan 03, 2011 Jan 03, 2011

Copy link to clipboard

Copied

Hi Dave,

Its been a few years since I dug into connector filtering (big caveat as

this is a bit of a meander and some of this is thinking out loud to refresh

myself), but as I recall the connector property ignoresuffixmap has a

default of true, and the result of which should be to not send all URL

patterns to JRun for processing, but rather to let IIS determine if a

request URL should be handled by IIS, by JRun, or by something else.

I believe the sequence of events is that when the connector starts up it

gathers a list of servlet mappings from JRun for things like *.cfm, or

/flashremoting/, etc and that when ignoresuffixmap is true a request for

/foo.cfm would show in the verbose connector logs as "filtering ignored

suffix .cfm" followed by "filterRequest no match", then control would return

to IIS for further extension mapping. If ignoresuffixmap were changed to

false, then the connector would begin attempting to pattern match the URL to

known servlet mappings for the JRun server and then send them to JRun

immediately for processing without letting IIS inspect the URL further.

If there are no servlet mappings for static content types, and IIS is not

configured to send static content types to CF (as some people do "hide" the

app server type by using .html extensions instead of .cfm), then no matter

the value of ignoresuffixmap, the connector wouldn't send the URL to be

processed by JRun. That means that if JRun were down/busy/unavailable to

process requests, it wouldn't affect requests for static content types of

html, css, js, jpg, etc because the connector would never send the request

over to JRun.

I suppose, however, that if there were some kind of problem with the

connector's ability to perform the filtering (including filtering to ignore

suffixes), then the connector could (possibly?) be the culprit returning 503

status codes back to IIS even though the actual JRun server itself was never

brought into the loop. ... Finally, this is the "Ah ha!" moment when I

realize that you are correct.

So in short: What Dave said. It could be the connector. You might want to

enable connector logging as the next step in finding out to what's going on.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guest
Jan 04, 2011 Jan 04, 2011

Copy link to clipboard

Copied

I will try to play with extensions management to ignore static content, but with CF8, all content are handle by JRun, and I've never seen "returning error page for JRun too busy or out of memory".

Do you know exactly when this log output is displayed ? I cannot enable verbose log when server is in production (really too big), and I didn't succeed to reproduce the case.

However, I've captured a connection with wireshark when the error has occured. There is no delay during the request, so I think the problem is not linked with sort of timeout or something like that.

GET / HTTP/1.1

User-Agent: check_http/v1861 (nagios-plugins 1.4.11)

Connection: close

Host: xxxxxxxxx


HTTP/1.1 503 Server Error

Connection: close

Date: Thu, 30 Dec 2010 01:26:40 GMT

Server: Microsoft-IIS/6.0

content-type: text/html


(no more content in response)

Edit: I've enable verbose log... and checking disk space

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Engaged ,
Jan 04, 2011 Jan 04, 2011

Copy link to clipboard

Copied


For now, I'd start with the basics. 

- Tuning the request pools (JRun active handler request pool, then secondarily the 4 CF request pools cfm/cfc/ws/fr).  I still suspect that having the JRun running request pool (maxactivehandlers) set to 150 will a problem at some point.  During any heavy traffic, the JVM could work the CPUs too hard and unnecessarily jack up memory requirements until the server goes into a thrashing state.  I could be wrong since I don't know your hardware or your application, but my experience with helping others tune servers has shown that lower numbers produce better throughput and reduce/eliminate the thrashing of resources.

- Analyze the JRun logs for "Java heap space", OutOfMemoryError, and watch the JRun/bin directory for the presence of Hotspot crash logs (logs starting with "hs..."  Can you trace down any of these at all?  There's a whole line of questioning to follow up on this if found.

- Can you enable JRun metrics logging? This will track active/busy request count as well as memory used/free.  It would be good to know if there is a relationship between high values for reqs or memory with the start of the 503 errors.

Also, when exactly do the 503 errors start?  Would you say it happens right after CF/IIS restart, or only after a surge in traffic, or seemingly random?  Can you correlate the frequency of the errors with any other events like maybe some scheduled tasks or ... ?

Regarding the connector, I found one possibly related thread about how a change to web.xml to add a new extension for CF files had caused 503s.  See comments by "Cheve"at

http://www.simonwhatley.co.uk/poor-mans-http-compression-with-coldfusion

Also found a CF bug where a developer was purposefully creating a memory leak to test a scenario, and it resulted in the 503 error, correlated with the Java Heap Space error message: http://www.elliottsprehn.com/cfbugs/bugs/78721#stack-trace   The bug is marked closed, but with no comments by Adobe about its status or any explanation.   I just re-read your original post, and I see now that your CF9 connector logs were mentioning a possible memory situation "JRun too busy or out of memory" and that CF8 did not.  So that might be more reason to seek down evidence of memory issues in the logs and try to trace it back to the app.  I also noticed that your original post had stack traces that also included IllegalStateException errors at the same time, and that bug report has a similar pattern of showing that exception right around the time of the memory error.  In your case, preceding that is the com.oreilly stack trace refering to unexpected end of part.  That com.oreilly library is responsible for handling file uploads, so you may want to purse that angle as well.

Hope this helps

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guest
Jan 04, 2011 Jan 04, 2011

Copy link to clipboard

Copied

Here is the day report.

I've enable verbose log on connector almost all the day. I've seen ZERO "returning error page for JRun too busy or out of memory" in my multigigabit log! Around 5:00 PM I've turned off debug, and in the 30 minutes after, 3 lines "returning error page for JRun too busy or out of memory" ... With the 3 "503" error in my IIS log file.

Or I've very unlucky, or the connector is really boggy!!!??

For now, I've turned on metrics on JRun.

Connector log:

2011-01-04 18:25:55 jrISAPI[3912:4140]  returning error page for JRun too busy or out of memory

2011-01-04 18:26:26 jrISAPI[9412:4140]  returning error page for JRun too busy or out of memory

2011-01-04 18:26:30 jrISAPI[1532:4140]  returning error page for JRun too busy or out of memory

JRun metrics:

04/01 18:24:00 metrics Web threads (busy/total/listen/idle/delay/drop/handle): 14/40/22/4/0/0/147 Sessions: 25 Total Memory=1036288 Free=891897

04/01 18:25:00 metrics Web threads (busy/total/listen/idle/delay/drop/handle): 12/40/22/6/0/0/109 Sessions: 34 Total Memory=1038208 Free=947887

04/01 18:26:00 metrics Web threads (busy/total/listen/idle/delay/drop/handle): 13/42/23/6/0/0/203 Sessions: 42 Total Memory=1039296 Free=940044

04/01 18:27:00 metrics Web threads (busy/total/listen/idle/delay/drop/handle): 10/42/25/7/0/0/232 Sessions: 54 Total Memory=1037760 Free=937702

04/01 18:28:00 metrics Web threads (busy/total/listen/idle/delay/drop/handle): 13/42/25/4/0/0/247 Sessions: 64 Total Memory=1040768 Free=889198

04/01 18:29:00 metrics Web threads (busy/total/listen/idle/delay/drop/handle): 12/42/25/5/0/0/149 Sessions: 69 Total Memory=1041856 Free=877038

04/01 18:30:00 metrics Web threads (busy/total/listen/idle/delay/drop/handle): 13/42/25/4/0/0/147 Sessions: 74 Total Memory=1042240 Free=874367

IIS Logs:

2011-01-04 18:26:26 GET / - - XX.XX.XX.XX HTTP/1.1 503 137
2011-01-04 18:26:30 GET /fdsfsdqfsdqfsdqfqsdqf/83451815_0.jpg - - XX.XX.XX.XX HTTP/1.1 503 137
2011-01-04 18:31:49 POST /fdsfsdfdsf/upload.cfm var=value - XX.XX.XX.XX HTTP/1.1 503 137

The 1st is a CFM ressource, the 2nd is a static ressource, and the last one is a POST to a CFM ressource. For the 3 requests I have the same stack trace on JRun log I've put on my first post.

The JRun logs have no entry about a memory issue, and it seems to appear at random moment. Obviously I've more frequent under heady load, but when you find a 503 at the timestamp "2011-01-01 01:43:03 jrISAPI[7164:2380]  returning error page for JRun too busy or out of memory", I think there is not a lot of people that spent time on our website on the new year night

There is no scheduled task in the server, and I didn't find any hs... file in my runtime\bin directory (never crash?).

I'm lost...

Can you explain me the value handledRq (the last one) on my metrics logs ??

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Engaged ,
Jan 04, 2011 Jan 04, 2011

Copy link to clipboard

Copied

I believe that handledRq refers to how many requests were handled or

processed to completion during the last metric logging interval (1 minute).

The metrics logging shows that memory is fine (free remains high), and that

you have about 10-14 active requests at the moment of logging, with a high

water mark of 42 handler threads with 5 or 6 that have been inactive long

enough to be considered idle. The requests handled per minute has a 50%+

increase between 18:24 and 18:27, a minor surge, but still none shown for

delay.

Seven years ago I once blogged about how if JRun queues requests and those

queued requests timeout there without ever running that JRun will return a

503 status code with the error message "returning error page for JRun too

busy or out of memory". However, it doesn't appear to be the case here

since your JRun metrics shows 0 for delay.

The information you provided so far has been helpful, but at this point

there is still not enough for me to make a clear conclusion. As the next

step I'd take several "snapshots" from the Server Monitor while the 503

errors are occurring (snapshots are triggered via a web request, so

generating a thread dump directly on the OS to bypass the web may be a

better option). The snapshots/threaddumps will show you exactly what

threads are running, how many, what they are doing down to the line of code

(java source code, not just cfml code).

Also, you may also want to contact a ColdFusion consultant to help you

troubleshoot this for you further. I recommend FigLeaf.com or go2ria.com.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Engaged ,
Jan 04, 2011 Jan 04, 2011

Copy link to clipboard

Copied

Although I'm certainly willing to continue trying to help out here with the

problem under discussion, this form of help might be to slow for you. So

once again I'll suggest that if this is an urgent business need, then it's

really in your best interest to hire a dedicated and experienced consultant

to help resolve this.

In addition to the consultants I mentioned earlier, I'll point out that

Charlie Arehart maintains a very helpful resource center, CF411.com,

intended to help people in just this situation. One of the resources is a

larger list of CF consulting companies. Charlie himself is known to be

among the very best at diving into all manner of ColdFusion performance

issues.

See this for more:

CF-oriented Troubleshooting Consultant

http://www.cf411.com/#cfassist

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jan 04, 2011 Jan 04, 2011

Copy link to clipboard

Copied

Thanks for that, Steven. I've stayed quiet in this thread because so many others had jumped in and were offering useful info.

I'll add this, though: erloz, you refer to errors you're seeing in the web server connector logs. What are you seeing instead in the \runtime\logs (or \logs, if on Multiserver)? I know you've seen errors related to non-CF files, but given the wildcard filter in IIS and how the connector does seem to at least look at non-CF files briefly, what you may see in the connector logs may be more of the result of a more serious error having happened first in processing CF pages that may make it non-responsive to the connector. That's a bit of a wild guess, but it's worth checking. In particular, look in those logs for any "outofmemory" errors (search for that specific string).

I have a blog entry that helps address some of what you may see if you find them:

http://www.carehart.org/blog/client/index.cfm/2010/11/3/when_memory_problems_arent_what_they_seem_part_1

Finally, you don't make it clear: are these errors happening on ALL requests, all the time? You referred to it as part of something happening on an upgrade from CF 8 to 9. Has this been since the upgrade? Constant? Periodic? Any additional info may help.

/charlie


/Charlie (troubleshooter, carehart.org)

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guide ,
Jan 05, 2011 Jan 05, 2011

Copy link to clipboard

Copied

Hello,

Looking thru the thread you do not say what is the bit 32 or 64? I mention because on 64 bit I have noticed the default MaxPermSize inadequate, which leads to many major Garbage Collections - part of major GC being stop all Java process hence bad for performance.

With the Java messages in the OUT log I am curious as to the JVM arguments in CF\runtime\bin\JVM.CONFIG?

I am not entirely sure CF JVM is the cause of problem however I would not discount it particularly since the jrun_iis6_wildcardYYMMDD.log - returning error page for JRun too busy or out of memory (tho I not seen that one before myself).

HTH, Carl.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guest
Jan 05, 2011 Jan 05, 2011

Copy link to clipboard

Copied

@Charlie:

in the cf\runtime\logs there is only that kind of information:

05/01 10:28:44 error unexpected end of part
java.io.IOException: unexpected end of part
    at com.oreilly.servlet.multipart.PartInputStream.fill(PartInputStream.java:96)
    at com.oreilly.servlet.multipart.PartInputStream.read(PartInputStream.java:191)
    at com.oreilly.servlet.multipart.PartInputStream.read(PartInputStream.java:152)
    at com.oreilly.servlet.multipart.FilePart.write(FilePart.java:257)
    at com.oreilly.servlet.multipart.FilePart.writeTo(FilePart.java:215)
    at coldfusion.filter.FormScope.fillForm(FormScope.java:252)
    at coldfusion.filter.FusionContext.SymTab_initForRequest(FusionContext.java:376)
    at coldfusion.filter.GlobalsFilter.invoke(GlobalsFilter.java:33)
    at coldfusion.filter.DatasourceFilter.invoke(DatasourceFilter.java:22)
    at coldfusion.filter.CachingFilter.invoke(CachingFilter.java:53)
    at coldfusion.filter.RequestThrottleFilter.invoke(RequestThrottleFilter.java:126)
    at coldfusion.CfmServlet.service(CfmServlet.java:200)
    at coldfusion.bootstrap.BootstrapServlet.service(BootstrapServlet.java:89)
    at jrun.servlet.FilterChain.doFilter(FilterChain.java:86)
    at coldfusion.monitor.event.MonitoringServletFilter.doFilter(MonitoringServletFilter.java:42)
    at coldfusion.bootstrap.BootstrapFilter.doFilter(BootstrapFilter.java:46)
    at jrun.servlet.FilterChain.doFilter(FilterChain.java:94)
    at jrun.servlet.FilterChain.service(FilterChain.java:101)
    at jrun.servlet.ServletInvoker.invoke(ServletInvoker.java:106)
    at jrun.servlet.JRunInvokerChain.invokeNext(JRunInvokerChain.java:42)
    at jrun.servlet.JRunRequestDispatcher.invoke(JRunRequestDispatcher.java:286)
    at jrun.servlet.ServletEngineService.dispatch(ServletEngineService.java:543)
    at jrun.servlet.jrpp.JRunProxyService.invokeRunnable(JRunProxyService.java:203)
    at jrunx.scheduler.ThreadPool$DownstreamMetrics.invokeRunnable(ThreadPool.java:320)
    at jrunx.scheduler.ThreadPool$ThreadThrottle.invokeRunnable(ThreadPool.java:428)
    at jrunx.scheduler.ThreadPool$UpstreamMetrics.invokeRunnable(ThreadPool.java:266)
    at jrunx.scheduler.WorkerThread.run(WorkerThread.java:66)

05/01 10:28:44 error (JRun Service: ProxyService [jrun.servlet.jrpp.JRunProxyService@16c00e2]) JRunPRoxyServer.invokeRunnable:
java.lang.IllegalStateException
    at jrun.servlet.JRunResponse.getWriter(JRunResponse.java:205)
    at jrun.servlet.JRunResponse.sendError(JRunResponse.java:597)
    at jrun.servlet.JRunRequestDispatcher.invoke(JRunRequestDispatcher.java:328)
    at jrun.servlet.ServletEngineService.dispatch(ServletEngineService.java:543)
    at jrun.servlet.jrpp.JRunProxyService.invokeRunnable(JRunProxyService.java:203)
    at jrunx.scheduler.ThreadPool$DownstreamMetrics.invokeRunnable(ThreadPool.java:320)
    at jrunx.scheduler.ThreadPool$ThreadThrottle.invokeRunnable(ThreadPool.java:428)
    at jrunx.scheduler.ThreadPool$UpstreamMetrics.invokeRunnable(ThreadPool.java:266)
    at jrunx.scheduler.WorkerThread.run(WorkerThread.java:66)

java.lang.IllegalStateException
    at jrun.servlet.JRunResponse.getWriter(JRunResponse.java:205)
    at jrun.servlet.JRunResponse.sendError(JRunResponse.java:597)
    at jrun.servlet.JRunRequestDispatcher.invoke(JRunRequestDispatcher.java:328)
    at jrun.servlet.ServletEngineService.dispatch(ServletEngineService.java:543)
    at jrun.servlet.jrpp.JRunProxyService.invokeRunnable(JRunProxyService.java:203)
    at jrunx.scheduler.ThreadPool$DownstreamMetrics.invokeRunnable(ThreadPool.java:320)
    at jrunx.scheduler.ThreadPool$ThreadThrottle.invokeRunnable(ThreadPool.java:428)
    at jrunx.scheduler.ThreadPool$UpstreamMetrics.invokeRunnable(ThreadPool.java:266)
    at jrunx.scheduler.WorkerThread.run(WorkerThread.java:66)

There is no line about a memory issue!

These errors happening on random request not ALL!! (arround 0.05%), and there is no real link between load and this... Our applications handle a lot of POST download, but like I've described before, error can occured on classic cfm page and static ressource...

What I mean with upgrade CF8 to CF9, is with our 10 other servers running on CF8, I've never see that 503 issue outside a real jrun problem (like a memory crash, restart, etc...)

@Carl:

Server is running on WIN2003R2SP2 32 bits OS (dual Xeon E5506, 4Go RAM). Here is the jvm.config arguments:

# Arguments to VM
java.args=-server  -Xms1024m -Xmx1024m -Dsun.io.useCanonCaches=false -XX:PermSize=256m -XX:MaxPermSize=512m -XX:+UseParallelGC -Dcoldfusion.rootDir={application.home}/../ -Dcoldfusion.libPath={application.home}/../lib -Dcoldfusion.classPath={application.home}/../lib/updates,{application.home}/../lib,{application.home}/../gateway/lib/,{application.home}/../wwwroot/WEB-INF/flex/jars,{application.home}/../wwwroot/WEB-INF/cfform/jars

Do you know if it should be possible to try old JRun connector used by CF8 ?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Engaged ,
Jan 05, 2011 Jan 05, 2011

Copy link to clipboard

Copied

A couple points based on latest discussion:

1) Permanent Generation size is not a problem unless you start seeing

"java.lang.OutOfMemoryError PermGen Space".

2) People keep referring to memory errors because the error message suggests

it as a possibility, "JRun too busy or out of memory". But that's just the

connector taking a guess as to why JRun server isn't responding to the

connector.

3) You should not use a CF8 connector for a CF9 server because the connector

is versioned for JRun builds, so the hooks or communication is tightly

coupled between them. Attempting to do that will likely cause additional

problems.

In the past I demonstrated how the 503 status code with the "JRun too busy

or out of memory" message can be caused by JRun requests timing out in the

queue before getting to run. But based on the JRun metrics log you provided

it did not appear to be the problem because requests were not queuing. I

still think you need a more comprehensive analysis to include settings

review, log review, and snapshot/thread dump analysis.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Engaged ,
Jan 05, 2011 Jan 05, 2011

Copy link to clipboard

Copied

Continuing to research this, I ran across this thread which discusses how a

503 status code might be returned when a user is performing a file upload

and the user cancels the request. More details at:

http://www.houseoffusion.com/groups/cf-talk/thread.cfm/threadid:38705

Perhaps the associated static content 503 errors are occur from the same

user, say there was a form to upload where the action page had static

content like css, jpg, js, etc, and the form upload cancellation caused all

resource requests for the action page to return a 503. Just a hunch.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jan 05, 2011 Jan 05, 2011

Copy link to clipboard

Copied

Erloz, are you saying you just eyeballed the logs (to find any outofmemory errors) or that you searched the logs? You really must do the latter, if you want to be absolutely sure it's not an issue. You could miss them on one line, especially with all that other stuff (which in my experience is more from the result of other problems, not the cause of them). Again, I address how best to do that search in my blog entry.

Steven's idea about stopped uploads is certainly interesting, too, if there's some limit to when they happen for you.

Finally, have you searched your web server logs to find the requests getting 503 errors? You may find a patter, whether in the domain, the filename, the IP address, the request method, the time, etc.

/charlie


/Charlie (troubleshooter, carehart.org)

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guest
Jan 05, 2011 Jan 05, 2011

Copy link to clipboard

Copied

@Charlie:

Yes I've made a big "grep" on my logfile, and no result about memory...

I've just discovered a very strange behaviour:

I've created a page which generate an oufofmemory:

bigoutput.cfm:

<cffile action="read" variable="toto" file="C:\bigfile.content">

<cfoutput>#toto#</cfoutput>

This generate a 500 error. OK.

The next call to a CFM ressource (or something map to servlet) is everytime a 503 !!!!!

Example:

erlo@bacon:~/debugcf$ wget http://server/bigoutput.cfm
--2011-01-05 18:02:56--  http://server/bigoutput.cfm
Resolving server... XX.XX.XX.XX
Connecting to server|XX.XX.XX.XX|:80... connected.
HTTP request sent, awaiting response... 500 Internal Server Error
2011-01-05 18:03:00 ERROR 500: Internal Server Error.

erlo@bacon:~/debugcf$ wget http://server/calltoapagewhichdoesnotexist.cfm
--2011-01-05 18:03:47--  http://server/calltoapagewhichdoesnotexist.cfm
Resolving server... XX.XX.XX.XX
Connecting to serverXX.XX.XX.XX|:80... connected.
HTTP request sent, awaiting response... 503 Server Error
2011-01-05 18:03:47 ERROR 503: Server Error.

In my example, I call a ressource which does not exist (but is map to a servlet) arround 50 seconds after the previous error. I can reproduce this everytime.. It is the same behaviour with an existing cfm ressource, but not with a static ressource.

This seems to be a bug, and does not explain the 503 on static content, and for the moment I cannot find a 500 just before my list of 503 error on production environnement.

edit:

I've tried on CF8 server, and I cannot reproduce this behaviour.


A think it is really "connector related": if I stop coldfusion after my 500 error, I can have the same immediate 503 error..... the timeout produced by stopping the service is visible only on the 2nd request..... ARGH

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Engaged ,
Jan 05, 2011 Jan 05, 2011

Copy link to clipboard

Copied

If you monitor the total heap size (used/allocated/max) and watch for

garbage collection, it might be that when the server hits an

outofmemoryerror it will continue to not respond to the connector until GC

kicks in and frees some memory up. I suspect that GC runs and completes

just before the server starts processing requests again. HTTP 503 status

code is kind of a catch all for server not available, which would be the

case after an OOME for a brief period.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jan 05, 2011 Jan 05, 2011

Copy link to clipboard

Copied

That's an interesting observation, but then I would ask: are you seeing the outofmemory error in the runtime log when you run the test? If so, then if your theory was correct, you'd expect to have found other outofmemory errors from the preceding times this was happening.

So if you don't mind, while you may want to focus on the discovery, I really think it will help you (and others here) diagnostically if you would check your grep. You should certainly be finding the oom from your test, right? If not, maybe you had a problem with how you were grepping/searching for the errors. Try it again. Again, just trying to help, not making any accusations.

/charlie


/Charlie (troubleshooter, carehart.org)

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guide ,
Jan 05, 2011 Jan 05, 2011

Copy link to clipboard

Copied

ARGH indeed,

JVM args look OK. There is one thing I like to mention with the Xms and Xmx values set to the same. Many times this is fine to do that is set the initial heap to the same as maximum heap. With that setting sometimes I have seen issues where the heap, which is made up of NewGen and OldGen, make poor decisions for sizing NewGen. The problem occurs where NewGen which has Eden and two Survivor Spaces sets itself to a large size, OldGen is occupied enough, a minor Garbage Collection occurs and objects that are going to tenure (move from NewGen survivor space to OldGen) can not fit in the space as the tenure survivor space objects are more than what is free in OldGen.  To overcome that happening make a decision for the JVM and set a size for NewGen with either a ratio  or Xmn switch. To know if such is occurring you would need to enable JVM logging. Caveat - I do not think you are having a JVM issue with the server and JVM args in use, so I do not like to waste your time and add to frustration if it is the wrong path to take to rectify the matter.

Interesting observation re works ok on CF8 but errors on CF9. You are at least CF9 update1?

HTH, Carl.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Engaged ,
Jan 05, 2011 Jan 05, 2011

Copy link to clipboard

Copied

This blog entry shows how to configure the sizing of the New Generation's

Eden and Survivor spaces:

http://www.webapper.com/blog/index.php/2006/06/08/20060606021131/

With these you can set the min and max heap the same while not worrying

about the generation size issues. Setting the min/max heap the same is known

to improve performance as the primary benefit.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guide ,
Jan 05, 2011 Jan 05, 2011

Copy link to clipboard

Copied

Hello,

One other  thing with JVM. On 32 bit there is a maximum at which point Jrun.exe (ie ColdFusion) will stop responding. With Max heap 1024 (Xmx1024) and MaxPermSize 512 perhaps both committed (however not necessarily holding objects) then your going to run over the 32 bit boundary as there are other buffers holding memory eg Code Cache, and some DLL’s.

Here is reference from Oracle as to limit:
http://www.oracle.com/technetwork/java/hotspotfaq-138619.html#gc_heap_32bit


Using TASKMAN what memory is Jrun.exe using when you get the problem?

HTH, Carl.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guest
Jan 06, 2011 Jan 06, 2011

Copy link to clipboard

Copied

@Charlie:

Yes, when I voluntary generate the 500 error, I can see the OOM info in my log:

06/01 11:28:57 error ROOT CAUSE:
java.lang.OutOfMemoryError: Java heap space

It is more difficult to simulate, but I think I can have the same behaviour by stopping a download/upload. I will try this case as soon as possible.

I've notice other strange things:

  • When a enable "verbose" in connector setting, I cannot reproduce the "503" error
  • I can reproduce the error on every CF9 install (the connector/jrun version is always the same: 108795)
    • Win2003/II6 with CF9/CF9.0.1 Entreprise/Standard
    • Win2008/II7 with CF9/CF9.0.1 Standard (not tested with Entreprise)

Other element:

After the 500 error: JRun want to close the link with connector, but the connector is staying in "CLOSE_WAIT" status (that means the connector does not destroy the socket) while the JRUN is wainting for the "FIN" packet from Connector (FIN_WAIT2)...

If I manualy kill the socket (with tcpview on windows for example), the behaviour is normal, the request after is not a 503.

If I do not kill the socket, the next request is everytime a 503 (for mapped ressource), and the connection to JRun is re-initialize.

On CF8, after the same 500 error, the link between the connector and JRun is always ESTABLISHED.

Is anyone can try to reproduce the case ?

Is this issue could come from the "new" pool system of the connector ?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jan 06, 2011 Jan 06, 2011

Copy link to clipboard

Copied

Good detective work, Erloz. As for others reproducing it, you may find that more difficult to expect from folks here, from my experience. Not many have enough understanding of the connector to really help, or may not have the time/motivation to setup and analyze test cases (myself included). But we'll see if any step up.

Really, though, you have enough info and detailed observation now to approach Adobe support. Sure, you have to pay but if they confirm it a bug, your money is refunded. That said, it could end up that you instead have some config error, somewhere (CF, Jrun, web server, connector, OS). Since your problem doesn't seem something that others are experiencing widely, you have to consider that possibility.

But if you want resolution, even at the risk of it not being a free call, it seems that contacting Adobe may now be the best next step, unless someone else dives in here. Sadly, you can't really expect that someone from Adobe will see this thread. These are not official support forums.

But certainly some will be interested to hear whatever resolution you find, if you'll be so kind to share it even if you have to pay (since we all contribute what time we do here, trying to help, purely on a volunteer basis.)

One last question, though: you refer to a "'new' pool system of the connector'. What's that in reference to? I just don't recall someone asserting that, nor have I heard of some change in 9.

/charlie


/Charlie (troubleshooter, carehart.org)

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guest
Jan 06, 2011 Jan 06, 2011

Copy link to clipboard

Copied

First, thanks to people who have spent his time on my case.

About the connector of CF9, I've read informations at the bottom of the PDF: http://www.adobe.com/products/coldfusion/pdfs/cf9_performancebrief_ue.pdf

Here is a quote of this document:

IIS maintains the worker thread pool size of 256 per CPU. (Some threads are used internally by IIS). When a request arrives, one thread is picked from the IIS worker thread pool and used until the request is completed. When a request arrives for the IIS thread, the request is blocked until ColdFusion serves the CFM page. Meanwhile, if a request comes in for other content types like HTTP or images, IIS uses other threads in the
pool to serve them. If all IIS worker threads are being blocked by ColdFusion requests, there will be no threads immediately available for IIS to serve pages of other content types.
When a connector receives a request in ColdFusion 9, it picks the native thread from its thread pool and returns the IIS worker thread with IO Completion (the request will be in IO block state). Once the IIS request is freed from the request, it is immediately available to handle the next request and to serve pages of other content types. Once the request is completed by ColdFusion, the connector thread is returned to its native thread pool.
Users can tune their applications using different native connector thread pool size values. By default, the native connector thread pool size is 25. This thread pool is defined as “maxworkerthread” in file [coldfusion_home]\runtime\lib\wsconfig\1\jrun_iis6_wildcard.ini.

If I've a solution for this problem, I will come back to explain it here.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jan 06, 2011 Jan 06, 2011

Copy link to clipboard

Copied

Sure, but Erloz, are you saying that that's what you thought was something "new"? It's not, as far as I know. That's just the perf brief for CF9. It's not all necessarily that the things being discussed were necessarily all "new". Again, just trying to help. Will look forward to hearing more that you may learn.

/charlie


/Charlie (troubleshooter, carehart.org)

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources
Documentation