This content has been marked as final. Show 4 replies
I think your right - it's just too hard for Adobe. We spent our $500 with Adobe and for 6 weeks we sent them logs and code and configuration file settings to only have them give up and say they have no clue. Save your money on Adobe. We are now looking into a Java Memory Profiler and CF Enterprise so we can run multiple instances and hope that all the instances don't crash at the same time. (Insert more Irony here - CF dosn't work well so we buy more CF?!?)
We also purchased Fusion Reactor which can kill those long running threads for you either manually or on it's own based on a time limit you set. Yes, I know this is not a solution, it's just a reaction (hence the name of the product) to the poor programming in parts of ColdFusion.
Our problem is we are seeing an issue where a thread or threads won't die completely and then memory builds ups over a span of about 20 minutes. Some others in other threads noted it takes hours to see the memory max out. The reason ours maxes so fast (at 1gb of memory for the JVM) is that we average 10-15 simultaneous requests/second. The hardware is big and massive and the CPU usage is about 33% across all 4. We are using Fusion Reactor (FR). It shows the threads that are running and the ones that are not releasing. We have FR set to kill long running threads, but that does not always work. We can almost guarantee that when a thread gets killed we are going to have a crash in about 20 minutes. It happens so much that I wrote a BASH script that runs every minute and checks to see if our web service is still responsive and does a CF Restart when it fails the response test twice. At this lets us sleep at night.
The multiple instances will not really do the trick without some planning. For example if you have appA shared/ balanced accros instance X and Y, and appA relies on WebService Z going off somewhere else and this is hanging, then both instance X and Y will go down. The only viable solution I can think of is to have parts of the appA split into basic stuff that can timeout correctly on instance X and stuff that doesn't obey timeouts on instance Y and use a scripty thing to determine if a user useing appA can actually do the cool stuff on instance Y (perhaps displaying messages like "service unavailable" or something). but this seems like a lot of work for something that should be simple. Remember, CF is supposed to be RAD.
The Thread killing stuff is a nightmare, as far as I can see it always will knock over the server.
I looked at Fusion Reactor before, but never got it, just checked out their site, they have a nice debugging thing coming out soon.
Actually we are moving onto a fully clustered environment. It won't be easy or cheeap, but we have to improve our service and uptime. Everything will run from multiple servers with mutile instances of CF. We have 4 db servers, 4 photo servers in place already. We are adding a cf server and a load balancer machine.
So, this should overcome CF's weak points.