• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

Jrun.exe at 99% CPU, Website unresponsive

New Here ,
Mar 07, 2007 Mar 07, 2007

Copy link to clipboard

Copied

I'm running ColdFusion MX 6.1 Updater with IIS 5 on a Windows 2000 Server (P4 2.4GHz, 1GB RAM). Recently I decided to install updates/hotfixes that I'd been behind on (it was running 6.1 without any updates). I installed the 6.1 Updater, then proceeded to install the hotfixes on this page, minus the ones that didn't apply (IIS 6, random database fixes): http://www.adobe.com/cfusion/knowledgebase/index.cfm?id=b3a939ce (everything above 6.1 Updater, of course)

I also modified the JVM arguments and Scheduler arguments as recommeded on this page:
http://www.sargeway.com/blog/index.cfm/2004/10/19/CFMX-Performance

Currently, the JVM arguments are as follows:
quote:

-server -Dsun.io.useCanonCaches=false -Xbootclasspath/a:"{application.home}/../lib/webchartsJava2D.jar" -Djavax.xml.parsers.SAXParserFactory=com.macromedia.crimson.jaxp.SAXParserFactoryImpl -Djavax.xml.parsers.DocumentBuilderFactory=com.macromedia.crimson.jaxp.DocumentBuilderFactoryImpl -XX:NewSize=64m -XX:PermSize=32m -XX:MaxPermSize=128m -XX:+UseConcMarkSweepGC -XX:+DisableExplicitGC


Min JVM heap size is set to 256MB, Max is 512MB. I'm running J2SDK 1.4.2_11 (I was running 1.4.2b28 before, or whichever version was distributed with MX 6).





So, the server runs fine for a while, but without any warning, JRUN.EXE starts using 99% of the CPU, and the website is effectively unresponsive. This goes for about 5 minutes, and then JRUN.EXE seems to go back to normal, yet IIS still won't respond to any requests.

When this first happened, I tried stopping and starting the World Wide Web Publishing service, but it was unresponsive. I had to use a third-party utility (pskill) to stop INETINFO.EXE, at which point it restarted automatically, and web pages were served as normal again. If I killed jrun.exe (with the same utility) while it was using all of the CPU, this would also fix the site's issues.

Looking through the logs, the only thing that looks significant around the same time is a few "Connection Reset" entries in \runtime\logs\default-event.log, but these could be due to people hitting Stop because the page is taking too long to load.

What's awesome is that I have a test server configured exactly the same way (same software, same hotfixes, same JVM configuration), all except for the CPU (P3 ~400Mhz), and it hasn't had any problems whatsoever.

So, can anyone give me some insight as to why this might be happening?

Views

7.0K

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Mar 07, 2007 Mar 07, 2007

Copy link to clipboard

Copied

Also, here's some of my settings:
Max simultaneous requests: 8
Timeout: 30 sec
Max cached templates: 512
Max cached queries: 100
Using MySQL 5.0 as DB server (also for client variables)

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guest
Mar 09, 2007 Mar 09, 2007

Copy link to clipboard

Copied

Bring your cached queries down to 50. The default 100 is entirely too high.

You might want to consider reviewing the IIS logs to figure out what pages are running around the time your CPUs spike.

What version of Java are you running?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Mar 09, 2007 Mar 09, 2007

Copy link to clipboard

Copied

If you have not done so already, enable the logging of slow running pages taking longer than 5 seconds (see http://livedocs.macromedia.com/coldfusion/7/htmldocs/00001718.htm). This will at least hopefully point out specific templates that are causing the problem (i.e. running a very long time and backing things up). Depending on what those templates do, it might help isolate the problem.

If the same configuration is working on test, you might be running into some environmental issue (network, much more data pulled back from production vrs. test db, etc).





Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Mar 09, 2007 Mar 09, 2007

Copy link to clipboard

Copied

It looks to me like there is somethign wrong with your web connector threads. If i were you, i would run CF using jvm 1. 5+ and take a look at whats going on using jconsole. Pay special attention to the threads, you may find that something in particular is using up your active threads and locking the application.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Mar 09, 2007 Mar 09, 2007

Copy link to clipboard

Copied

you need to add these arguments to your java args to make jconsole work with jrun:

-Dcom.sun.management.jmxremote.port=9300 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false

9300 is an arbitrary port that i use (Jrun web port + 1000)

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guest
Mar 09, 2007 Mar 09, 2007

Copy link to clipboard

Copied

If you're going to try to run CF in Java 1.5, Adobe isn't going to help. 1.4.2 the only version supported for CF.

I wouldn't generally recommend moving to 1.5 as there are many tools out there that will do the same thing and they work with 1.4.2. Seefusion comes immediately to mind.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Mar 12, 2007 Mar 12, 2007

Copy link to clipboard

Copied

Thanks for all the tips. I've been on vacation since last Thursday, but I'll start logging and try them one at a time to see which seems to work.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Mar 12, 2007 Mar 12, 2007

Copy link to clipboard

Copied

So far I've reduced the number of cached queries to 50 and enabled logging for pages that take longer than 10 seconds. Haven't noticed anything irregular yet...

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Mar 12, 2007 Mar 12, 2007

Copy link to clipboard

Copied

It looks like I spoke too soon. JRun maxes out the CPU after anywhere from 5 to 15 minutes after I restart it. The only pages that consistently run longer than 10 seconds are our most-hit pages, which is to be expected. I've analyzed the queries that run on those pages and altered the MySQL tables to include indexes for columns that were missed when the tables were created originally. I'm not sure if this will help (especially considering that it's JRun having the issues, not MySQL), but I figured it was worth a shot.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Mar 12, 2007 Mar 12, 2007

Copy link to clipboard

Copied

I tried installing JDK 1.5, only to find out that because I'm not using ColdFusion in J2EE configuration with JRun4, I can't use anything above 1.4.2. I reverted back to 1.4.2_11 and removed all the hotfixes in runtime/servers/lib, save the MySQL 5.0 connector.

Another thing I think I can try is to revert wsconfig.jar to the version that came with Updater 1. Currently it's at the version from this hotfix:
http://www.adobe.com/cfusion/knowledgebase/index.cfm?id=238944b1

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Mar 12, 2007 Mar 12, 2007

Copy link to clipboard

Copied

I've tried everything I can think of, but the server still dies within the hour. For now, I've uninstalled updater 1 and reinstalled 6.1 from the CD. The site seems to be running fine, minus navserver.cfm showing up as a blank page in CF Administrator (via SSL). I know there was an issue with 6.1 Updater 1 and this page, but as I've reverted to 6.1 and verified that the old navserver.cfm page is there, shouldn't it show up fine?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Mar 12, 2007 Mar 12, 2007

Copy link to clipboard

Copied

Does your production server jrun.exe leak memory like a siv and max out CPU?
Does this happen on the hour?
Is your development server perfectly fine?

This is the problem I was having, I tried all of the following first, and finally stumbled upon the fix yesterday.

If someone from Adobe, or a forum junkie here reads this, please remember this next time. I spent at least 2 months trying to figure this out, and have seen countless other lost strangers on this very forum who are definitely suffering from this:

First, fixes I found that might help you, but didn't work for me:
1. Patch server (always a good first step)
2. Make sure cflock (for session variables) is being used correctly
3. robots.txt, lower crawl speed ("Crawl-delay" setting). (Google Tools didn't recognize this parameter) (just remembered my robots.txt is still gimped)
4. You can run the "Coldfusion Administrator" service with desktop access (Log On tab) and use CTRL+BREAK to do a dump for infinite loop type bugs

I probably tried 5 or 6 other things randomly, my server now has hot fixes and monitoring software galore.

It's not really a bug, it's a "best practices" notice I foolishly glanced over, or never learned as a developer.

It's a troubleshooting nightmare because:
A) It's extremely hard to replicate (it starts on the hour).
B) You install a fix, see the server run great for 30 minutes, think you've fixed it, and blam, at random, for seemingly no trackable reason (you don't expect it to be hourly when you have no scheduled tasks on that system), it starts growing to 900MB mem usage and using 20-99% CPU, BEEP.
C) It is nearly impossible to replicate on a small development server.


This is what worked for me:
Example 6


quote:

Symptom: CPU spins at 100% - and high memory usage - about every hour after startup (large memory use too).

Here's a common one that some fall into going from development to production. The default for storing client variables is "REGISTRY". Once the number of records in the registry is large, the query to get all records and delete expired client records can take 100% CPU for minutes at a time.

Registry client storage should never be used for production systems but often developers fall into this by accident by not explicitly specifying a data source (and the ColdFusion admin defaulting to "REGISTRY"). Since the registry isn't a real database, ColdFusion has to retrieve the entire registry client tree (high memory usage) and compare the date/time one at a time to decide whether to purge a record. This is a CPU-intensive operation with an hourly purge that may only delete a few records.

Since this is a system task, it's running in a scheduler thread instead of a jrpp thread. Here's a case where a scheduler thread is relevant.



Before (and after) I did this, I purged the registry entries, about 50,000, using this

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Mar 13, 2007 Mar 13, 2007

Copy link to clipboard

Copied

LATEST
I'll try running a thread dump, but I've been using a MySQL database for client variables for quiet a while.

After reverting to 6.1, I'm still having the same issue. Looking at the jrun log in the runtime\lib\wsconfig\1 directory, there are a lot of these entries around the time it hangs:
[3960] dropped.
returning error page for Connection reset by peer
returning error page for Connection refused
PROXY_BUSY <- [3080]
returning error page for JRun too busy or out of memory

I'm not sure if this is related.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources
Documentation