You have some bad / expensive code somewhere.
Something in the code is causing this CPU to increase and grind to a halt. You will need to use something like the Server Monitor or Fusion Reactor to see what is going on when this happens and what requests are happening.
Once you can see what page is causing the issue, you can then look at the code to see why this is happening.
+1 what haxbh said
Andrew you could also monitor the CF Java plus tomcat to know those are well not having a problem leading to CPU being maxed out. You can use free tools like JMC which is part of Oracle JDK to check on CF Java plus tomcat when CF has JMX (java management extensions) enabled.
I have gotten in touch with support and have provided thread dump and heap dump to them. I am waiting for them to get back to me.
Support requested that I apply the latest CF updates and rerun the site connectors. Then they wanted a fresh thread and heap dump to analyze. Unfortunately this was on a production system so I have to go through the process of applying the updates to dev and test then running thru QA before I can apply them to production. This process take about a week so I had to go ahead and restart the services on the Prod system to get rid of the issue.
No I have to wait for the CPU to max out again before running the thread and heap dumps. This usually only happens ever couple of weeks.
We are experiencing the same issue. After about 6 days under production load we see abnormal high CPU, requests take longer and over performance degrades. Restarting the CF immediately fixes the issue. CPU profiles return to normal under the same load.
We are also running CF 2016 update 3 and Windows Server 2012 R2. A cluster of 6 nodes, the nodes begins to spike with in a few hours of each other as they all get restarted at the same time and take the same distributed amount of traffic.
This occurred production, and I was unable to take a heap dump at the time because the site was failing.
Have you had any new updates on the issue?
Unfortunately our issue seems more sporadic and can take up to three weeks before it happens. I am waiting now for the next occurrence to happen so I can send thread and heap dumps to support.
It could be anything, you need to have full-time monitoring installed. We use FusionReactor, consider that. Its free for 2 weeks and then you can do month by month after that.
I had something similar happen and it was that the jvm.config default XX:MaxMetaspaceSize=192m was not large enough. I increased mine to 512m and that helped.
Also play around with using XX:+UseConcMarkSweepGC (*instead of -XX:+UseParallelGC) -XX:+CMSParallelRemarkEnabled -XX:+ScavengeBeforeFullGC -XX:+CMSScavengeBeforeRemark in your jvm.config.
Here's some of the args I use:
java.args=-server -Xms5g -Xmx10g -XX:ReservedCodeCacheSize=128m -XX:MetaspaceSize=256m -XX:MaxMetaspaceSize=512m -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+ScavengeBeforeFullGC -XX:+CMSScavengeBeforeRemark
The CPU spiked again over the weekend. I was able to get a snapshot from within server monitor and also a heap dump on the server. I have sent both the Adobe CF support. I will add updates as I get them.
We're having same issue. ACF 2016 running on Windows 2012. Will try some of the suggestions provided by Neo Rye but one instance we keep strictly for scheduled tasks and as is a new dev box there are no tasks yet. Still it'll jump to 100% cpu within a couple of weeks of reboot.
CF support reviewed our heap and thread dumps. We don't have any memory leaks. They did find where the worker thread for monitoring services is getting blocked frequently so I changed the server monitoring IP in the jetty.xml from 0.0.0.0 to 127.0.0.1.
Also they had me increase the -xmx value in the jvm.config from 1 to 2 gigs and change the Garbage Collection setting from Parallel to G1.
So far everything is working since the change but only time will tell.
I have this same issue on my two ColdFusion 2016 servers but I will add the following. Our security guys run a scan of my servers twice weekly. The jump in CPU utilization that I see on my servers is directly related to these scans. I asked them to stop doing them for a few weeks and the issue stopped. Then as soon as they restarted the scans the issue returned. One of the scans runs every Sunday. Before the scan the CPU was running in the 3-7% range. Being a Sunday with our user off for the weekend we had nobody using any of the websites/applications. Following completion of the scan the CPU utilization has jumped to 88% and is holding in that area.
I am 100% convinced that the scan which attempts to access various ColdFusion scripts and functions is not releasing them so they remain in some sort of active state and thus do not release the CPU.
Note: To get by our issue until a fix is determined we have a reboot of the server scheduled for each Sunday evening.