Copy link to clipboard
Copied
We are running the latest CF 9 server running JVM 1.6_26 on a Win2003 server with an i7 processor and 8GB of ram.
Here is the JRun config:
java.args=-server -Xms4096m -Xmx4096m -Dsun.io.useCanonCaches=false -XX:PermSize=512m -XX:MaxPermSize=512m -XX:+UseParallelGC -Dsun.rmi.dgc.client.gcInterval=600000 -Dsun.rmi.dgc.server.gcInterval=600000 -Dcoldfusion.sessioncookie.httponly=true -XX:NewRatio=3 -Xbatch -Dcoldfusion.rootDir={application.home}/../ -Dcoldfusion.libPath={application.home}/../lib -Dcoldfusion.classPath={application.home}/../lib/updates,{application.home}/../lib,{application.home}/../gateway/lib/,{application.home}/../wwwroot/WEB-INF/flex/jars,{application.home}/../wwwroot/WEB-INF/cfform/jars
For the past few weeks, every couple of days the CF server grinds to a halt.
Using SeeFusion we can monitor the requests and see them just starting to stack up.
We are typically alerted to the brewing problem when our application starts sending notices
that SESSION variables are undefined. The interesting part is that typically the line
where the error occurs is after the variable has been checked if its defined:
For instance:
<cfif NOT IsDefined("SESSION.User")>
<cflocation url="somewhere">
</cfif>
Hi <cfoutput>#SESSION.User.GetUsername()#</cfoutput>
Reports an error USER IS UNDEFINED IN SESSION on the output line AFTER the variable has been checked for existence meaning
to me that somewhere in the middle of processing the thread, memory is getting screwed up.
Anyway, after starting to see random errors like this we log into SeeFusion and see that
memory usage is running at about 85% and simple page requests are stacking up.
I can force a full GC cleanup in milliseconds but it doesn't do anything for memory usage.
The page response times begin to climb.
At first we though it might be some long running page or report on our site but looking at the actively
running requests we see nothing intensive which could be causing the issue. Looking at the
task manager, the processes on the server are all running at 0% execpt for JRun which is hovering around
15% to 18%.
The problem isn't in the database either. Our MySQL database shows no long running queries, hung processes, or crashed tables
the application could be stalling over.
All of thisleads up to the site slowing to a crawl and then becoming completely unresponsive while JRun chugs along
at 15% and memory never maxes out. This never causes any errors in the logs ie memory heap errors or connection timeouts.
Its just crawls along. I've never let it sit in this state for more than 5 or 10 minutes so I don't know if it would eventually come back.
The only way so far to bring it back it to restart the CF server at which point everything returns to normal.
In other types of situations like this I've seen JRun peg out at 100% or more percent or memroy is pegged at 100% with an eventual heap error
or the database is locked up causing the app problems. But none of that happens here.
I'm truly stuck as to how to continue to diagnose and fix this problem.
Any help would be awesome. Thanks.
Copy link to clipboard
Copied
You can find out what the requests are āstuck doingā using the stack trace feature which is available in SeeFusion (or FusionReactor, or the CF Server Monitor), and which can tell you the line of code that a CF page is running at any moment. Since youāve ruled out many of the other common things (outofmemory errors in the logs, etc.), this seems your best next bet. See what all the hung requests are doing (particularly as a given request remains hung and repeated stack traces show the same line of code.)
I discuss this more in some resources:
http://www.carehart.org/blog/client/index.cfm/2009/6/24/easier_thread_dumps
http://www.carehart.org/blog/client/index.cfm/2010/10/15/Lies_damned_lies_and_CF_timeouts
(see the section āThe underlying solution: stack tracingā)
http://carehart.org/presentations/#stack
Hope that helps. Of course, you can also enlist the help of someone who does such CF server troubleshooting for a living. I do (http://www.carehart.org/consulting/), as do others, which I list as a category in my CF411 site: http://www.cf411.com/cfconsult. Hope thatās helpful.
/charlie
Copy link to clipboard
Copied
Shameless plugs
Copy link to clipboard
Copied
Akersha, since your āshameless plugsā comment is in indicated as being reply to my one comment in this thread (from Oct 2011), are you referring to the fact that I mention there are folks who can help with such troubleshooting, including myself? You really regard that as āshamelessā? When I offer a link to several other companies who do it also? Would you have preferred I remained silent on it, and leave the readers to dig all over the net to find who might be able to help?
Not all questions can be easily answered in back and forths on forums or mailing lists. Some people would rather get more immediate help. Giving them resources to consider is not shameless.
/charlie
Copy link to clipboard
Copied
Not all questions can be easily answered in back and forths on forums or mailing lists. Some people would rather get more immediate help. Giving them resources to consider is not shameless.
And, what's more, sometimes it's a better approach to just pay someone to fix something, rather than messing about on a forum. It's good advice to suggest to people that one option to get things fixed quickly is to pay someone to fix it for you.
That said Charlie: I wouldn't feed the trolls if I was you.
--
Adam
Copy link to clipboard
Copied
Thanks for the support, Adam.
As for any concern over āfeeding the trollsā, I see it more as putting out food to trap āem, and Iām loaded for bear. ;-} But seriously, I meant my question and pressing of the commenter sincerely. I donāt think it wise to leave am unsubstantiated comment out there without any reply. Weāll see what comes of it. Thanks again for your consideration.
/charlie
Copy link to clipboard
Copied
Hello Charlie, I know this post is old but here I am suffering from CF9 hanging.
I'll be completely open here:
The system is running on a i7 machine with 8GB of ram and Windows 7 Pro. However, it's running (for the moment, soon to be changed to Apache server) on the built-in CF server, for more than 25 users.
Besides this, is there any "tuning CF9 for production" guide for developers, not server administrators?
Any ideas or suggestions?
Thank you!
Copy link to clipboard
Copied
Hello,
Perhaps you could do well to post a new thread rather than reply to one many years old.
Regards, Carl.
Copy link to clipboard
Copied
Thank you Carl, I guess I am not very familiar with Forums best practices (and common sense) .
I'll take your suggestion.
Thank you!
Copy link to clipboard
Copied
Hi,
Other than what has been mentioned I think three other things may help:
-JVM log details
-client variables
-other logs
While JVM logging will not resolve matter it may show if one of the java generations is having a problem and you can apply adjustment to cope with particular load.
JVM args= ...-XX:+UseParallelGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC -verbose:gc -Xloggc:cfjvmGC.log -Dsun.rmi.dgc.client.gcInterval=600000 etc. Reader to backup JVM.CONFIG and make appropriate modifications without CR line feeds and so forth. Use example here as reference.
Creates a log file in ColdFusion9\runtime\bin\cfjvmgc.log or Jrun4\bin\ in case multiserver.
Read the log file or use a graphical tool to help eg GCViewer tool from:
http://www.tagtraum.com/gcviewer.html
Adobe recommends storing client variable in a database see:
One of Charlies blog references to check Runtime Jrun logs; this is often worthwhile so I would like to add emphasis to that by way of repetition. Read the log files coldfusion-event and coldfusion-out in \ColdFusion9\runtime\logs (or Jrun4\logs\ ). Examine these for possible hang causes with java.lang.OutOfMemoryError or java.lang.StackOverflowError messages.
HTH, Carl.
Copy link to clipboard
Copied
I'll try enabling logging as you have suggested.
I believe as you have suggested the problem may be in one of the java generations. As I mentioned, the server logs don't show any memory or stack errors when the machine goes unresponsive. Using Seefusion I've watched the memory climb on the machine from a constant 80% to up to 98% while still running and stay there indefinetly.. Interestingly, Seefusion shows that active requests, queries, and pages per second all remain fairly constant and relativly modest. Using Seefusion to force a GC recovers no memory. Our application makes heavy use of CFCs and OO structures. It makes me think that something (or many things) somehwere are being retained in memory and not being properly released.
Any thoughts about how to confirm or deny this?
ps We don't use client variables
Copy link to clipboard
Copied
I did a talk last year at CFMeetup - a great resource hosted care of Charlie - on CF JVM and logging related matters. Think you may benefit by reviewing the session:
http://experts.adobeconnect.com/p55663036/
Hope helps again, Carl.
Copy link to clipboard
Copied
Carl, thank you for posting the adobeconnect link!
I had the same problem, server hangs were a happening daily, sometimes 2- times.
I changed to -XX argument to :+UseParNewGC and added the -Xincgc argument. It looks good, we just past the 24 hour mark with no hangup(s)!!!
I have a question though about the -Xmx argument. The speaker was displaying his jvm.config file during the session and he had his set to 512m. Mine installed at a default setting of 4096m. Do you know if that is wise to change to 512, or what the advantages/disadvantages would be? Much of the discussion was over my head and this issue was not explored in depth.
My System Specs:
CF 9,0,1,274733
Java 1.6.0_14
OS Win 2008 R2 SP1
RAM 8gig
CPU dual 2.53GHz
Any advice is much appreciated.
My current argument set; working well:
# Arguments to VM
java.args=-server -Xmx4096m -Dsun.io.useCanonCaches=false -XX:MaxPermSize=192m
-XX:+UseParNewGC -Xincgc -Xbatch -Dcoldfusion.rootDir={application.home}/../ -Dcoldfusion.libPath={application.home}/../lib -Dcoldfusion.classPath={application.home}/../lib/updates,{application.home}/../lib,{application.home}/../gateway/lib/,{application.home}/../wwwroot/WEB-INF/flex/jars,{application.home}/../wwwroot/WEB-INF/cfform/jars
Copy link to clipboard
Copied
The Xmx switch controls the maximum memory available to the JVM, and you certainly don't want to just reduce it to 512 MB without knowing about your own environment. Are you running a 32- or a 64-bit system? My guess is you're running a 64-bit system, which can address significantly larger amounts of memory (which is why you have 8 GM RAM.
Dave Watts, CTO, Fig Leaf Software
Copy link to clipboard
Copied
Yes, 64-bit.
The speaker must have been using a development machine for his demo.
Thank for the information, I'll leave mine at 4096 and hope for many more days of no more hang ups.
Thanks
Copy link to clipboard
Copied
Hi Lyndon
Ditto what point out - do not decrease Xmx.
You could do well to:
- add an initial size Xms
-set initial setting for PermGen PermSize
-set a higher maximum for PermGen MaxPermSize
It can be hard to measure without log evidence to know if things improve or what parameters need changing, tho getting some more uptime is a good thing.
For example only. Take care with changes to JVM.CONFIG:
java.args=-server -Xms2048 -Xmx4096m -Dsun.io.useCanonCaches=false -XX:PermSize=192m -XX:MaxPermSize=512m -XX:+UseParNewGC -Xincgc -Xbatch -Dcoldfusion.rootDir={application.home}/../ -Dcoldfusion.libPath={application.home}/../lib -Dcoldfusion.classPath={application.home}/../lib/updates,{application .home}/../lib,{application.home}/../gateway/lib/,{application.home}/.. /wwwroot/WEB-INF/flex/jars,{application.home}/../wwwroot/WEB-INF/cfform/jars
HTH, Carl (aka the speaker on that talk session)
Copy link to clipboard
Copied
That was an outstanding briefing Carl!
I highly reccomend it for anyone needing to tune-up their CF server.
I did get you point about "backing up the jvm.config"! I'll proceede with caution.
Thanks again for all your help.
Lyndon
Copy link to clipboard
Copied
Carl,
Small typo in the -Xms argument
-Xms2048 should be -Xms2048m
CF would not start until I found that.
Copy link to clipboard
Copied
Do you by any chance use the CFImage tag? I've been having this problem for several years with CF8 and just set up a brand new 64-bit server with CF9 and within an hour after starting it up, it is also locking up. I am a heavy user of CFImage.
Copy link to clipboard
Copied
No we don't use the cfimage tag.
I went though Carls webcast on tuning the JVM and put a number of the tweaks and settings to use after profiling live traffic for a few weeks.
While where was some some improvement, the problem continues to persist. Memory slowly creeps upwards and then hovers at the 90%+ usage
even off peak hours with very little traffic. At that point it is a matter of time before the right traffic spike kicks the server into unresponsive mode.
It seems that either something we are doing is being held in memory and never released or there is a problem with the garbage collector.
Forcing manual garbage collection never seems to reclaim any memory.
In the short term we have just setup cron jobs to restart the CF server every night during off peak hours which frees up memory again for the next day.
Not ideal in any sense but working in the short term.
Copy link to clipboard
Copied
Hi,
The total memory reaches maximum and forced major GC from seefusion does not release total memory. It would be interesting to know how much of the memory is committed with objects versa how much is free (that is not holding objects however being allocated to Java / CF).
The JVM log details would indicate such tho can be hard to understand. You have JDK installed so you may well benefit by performing some Jconsol or Jvisualvm analysis found in \Java\jdk1.6.0_26\bin. To do that add these to JVM args:
-Dcom.sun.management.jmxremote.port=N (N=port number)
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false
EG:
JVM args= ...-XX:+UseParallelGC -Dcom.sun.management.jmxremote.port=8705 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dsun.rmi.dgc.client.gcInterval=600000 etc. Reader to backup JVM.CONFIG and make appropriate modifications without CR line feeds and so forth. Use example here as reference.
Of interest will be Jconsol Memory tab and Jvisualvm heap and permgen chart, paste such to the thread. Caveat - while jmxremote jvm args are present you will not be able to stop the CF application service from SERVICES.MSC you will need to kill Jrun.exe task. I worry your cron restart mentioned might fail.
Jconsol and Jvisualvm also have a full GC button which might be worth a try however I expect the seefusion GC will be performing the same task.
Your worried the garbage collector is not working and indeed you can change GC algorithm entirely however before offering such a suggestion I would like to see some of the JVM log or jmxremote details before and when the memory reaches 90%.
One other thing Win03 and JDK are 64 bit?
HTH, Carl.
Copy link to clipboard
Copied
PS
A limited but alternate way to see how much of the JVM memory is committed versa total would be to use CF9 Server Manager then select Details view. Does not show cost of GC or PermGen information tho no need to apply jvm args or CF restart.
eg
Quick View
Details View part
>Jconsol and Jvisualvm also have a full GC button which might be worth a try however I expect the seefusion GC will be performing the same task.
Your JVM args are set to run full GC every 10 minutes so the need for another manually should not be required.
Again, Carl.