8 Replies Latest reply on Aug 1, 2014 12:26 PM by Dink477

    ColdFusion 10 instance/Tomcat dying at predictable intervals (white screen of death)

    Dink477

      Cross-Post from Stack-Overflow: ColdFusion 10 instance/Tomcat dying at predictable intervals (white screen of death) - Stack Overflow

       

      ---

       

      For the last few days out team has been struggling with an ongoing issue where at very predictable intervals one ColdFusion instance has been white-screen-of-death-ing.

       

      Every three hours the site would simply start returning a blank white page for any url.  We would then restart the instance and everything would be great... for another three hours, almost to the minute. Of course this happened on a Friday, so all weekend people were taking turns re-booting the instance every time it died.

       

      As best as I can discern, no one made any changes to either ColdFusion or our server environment right before this started happening. Before this the instance was running fine.

       

      Since then we've seen that the isapi_redirect.log file for this instance is filled with Tomcat/connection errors.

       

      We followed the excellent instructions at Resolve Stability Problems and SPEED UP ColdFusion 10 » Web Trenches and adjusted our connector settings as recommended.  While this may have very well helped out general performance, and changed the timeframe from 3 to 3.5 hours between crashes, it has not resolved it.

       

      Before that we even tried moving the site from one of our virtual servers to another with no luck.

       

      We tried re-booting IIS and even re-booting the entire server the one night to see if that would help, and still nothing.

       

      Below is as much information as I can provide from what we are seeing in our logs and our configurations.  Any help would be very very much appreciated and please let me know what other details I can provide that would be useful.

       

      ---

       

      We are running IIS v7.5.7600.16385

       

      This is the only website/IIS record bound to this instance and it's bound specifically to it, not "All websites".

       

      When the problem occurs, I do not think any requests makes it to the instance... the IIS logs show that connections are still happening, but the http.log files for the instance just stop.

       

      I am not sure if the tomcat related errors are the problem or a symptom.

       

      The server runs fine when the problem occurs, we have several other CF instances running along side this one that have no issues.

       

      The CF admin for the instance in question loads and is completely responsive during the problem (This has not often, for me, been the case for other past issues with an instance).

       

      Again, no one changed anything with our code, CF instance configuration, or server configuration directly prior to this problem starting as far as we can tell.

       

      ---

       

      Server Product: ColdFusion

      Version: 10,0,13,287689

      Tomcat Version: 7.0.23.0

      Edition: Enterprise

      Operating System: Windows Server 2008 R2

      OS Version:  6.1

      Update Level: chf10000013.jar

      Adobe Driver Version: 4.1 (Build 0001)

       

      ---

       

      workers.properties:

       

      worker.list=Instance_Codebase

      worker.Instance_Codebase.type=ajp13

      worker.Instance_Codebase.host=localhost

      worker.Instance_Codebase.port=8014

      worker.Instance_Codebase.max_reuse_connections=250

      worker.Instance_Codebase.connection_pool_size=250

      worker.Instance_Codebase.connection_pool_timeout=60

       

      ---

       

      A sample of our isapi_redirect.log. A full chunk of it can be viewed at http://trasper.com/files/isapi_redirect.log.txt.

       

      The problem (in this example) happened right about at 11:41pm as far as we can tell.

       

      [Wed Jun 25 23:40:34.503 2014] [10012:912] [info] ajp_send_request::jk_ajp_common.c (1658): (Instance_Codebase) all endpoints are disconnected, detected by connect check (27), cping (0), send (0)

       

      [Wed Jun 25 23:40:34.504 2014] [10012:1396] [info] ajp_connection_tcp_get_message::jk_ajp_common.c (1313): (Instance_Codebase) can't receive the response header message from tomcat, network problems or tomcat (127.0.0.1:8014) is down (errno=54)

      [Wed Jun 25 23:40:34.820 2014] [10012:1396] [error] ajp_get_reply::jk_ajp_common.c (2190): (Instance_Codebase) Tomcat is down or refused connection. No response has been sent to the client (yet)

      [Wed Jun 25 23:40:34.823 2014] [10012:1396] [info] ajp_service::jk_ajp_common.c (2692): (Instance_Codebase) sending request to tomcat failed (recoverable),  (attempt=1)

        

      [Wed Jun 25 23:40:34.708 2014] [10012:7880] [error] ajp_get_reply::jk_ajp_common.c (2190): (Instance_Codebase) Tomcat is down or refused connection. No response has been sent to the client (yet)

       

      [Wed Jun 25 23:40:40.477 2014] [10012:2296] [info] ajp_connect_to_endpoint::jk_ajp_common.c (1047): Failed opening socket to (127.0.0.1:8014) (errno=61)

       

      [Wed Jun 25 23:40:40.364 2014] [10012:8256] [error] ajp_service::jk_ajp_common.c (2711): (Instance_Codebase) connecting to tomcat failed.

       

      [Wed Jun 25 23:40:40.825 2014] [10012:7060] [error] HttpExtensionProc::jk_isapi_plugin.c (2309): service() failed with http error 503

       

      [Wed Jun 25 23:40:40.877 2014] [10012:10364] [error] ajp_send_request::jk_ajp_common.c (1669): (Instance_Codebase) connecting to backend failed. Tomcat is probably not started or is listening on the wrong port (errno=61)

      [Wed Jun 25 23:40:40.965 2014] [10012:10364] [info] ajp_service::jk_ajp_common.c (2692): (Instance_Codebase) sending request to tomcat failed (recoverable), because of error during request sending (attempt=1)

       

      [Wed Jun 25 23:40:40.857 2014] [10012:1020] [error] HttpExtensionProc::jk_isapi_plugin.c (2309): service() failed with http error 503

        • 1. Re: ColdFusion 10 instance/Tomcat dying at predictable intervals (white screen of death)
          carl type3 Level 4

          Your isapi_redirect.log sample indicates the tomcat also known as catalina part of the CF10 system is failing. EG

           

          [info] ajp_connection_tcp_get_message::jk_ajp_common.c (1313): (Instance_Codebase) can't receive the response header message from tomcat, network problems or tomcat (127.0.0.1:8014) is down (errno=54)

           

          You helpfully provided a sample of workers.properties. Did CF10\"Instance_Codebase"\runtime\conf\server.xml also get adjustments to AJP for tomcat catalina to match the tomcat ISAPI connector adjustments?

          EG server.xml portion:


          <Connector port="8014" protocol="AJP/1.3"
          redirectPort="8445"
          tomcatAuthentication="false"
          maxThreads="250"
          connectionTimeout="60000">
          </Connector>

           

          Likely you have applied the updated IIS connector that came as part of updater 13 via WSCONFIG tool however for
          sanity check - what is the size and date stamp of CF10\config\wsconfig\N\isapi_redirect.dll ?

           

          You might like this official CF blog entry:

          http://blogs.coldfusion.com/post.cfm/coldfusion-11-iis-connector-tuning

           

          You might like this presentation I did:

          http://experts.adobeconnect.com/p8l51p4s9m4/


          HTH, Carl.

          • 2. Re: ColdFusion 10 instance/Tomcat dying at predictable intervals (white screen of death)
            Dink477 Level 1

            server.xml

             

            <Server port="8009" shutdown="SHUTDOWN">

                 <Listener className="org.apache.catalina.core.AprLifecycleListener" SSLEngine="on"></Listener>

                 <Listener className="org.apache.catalina.core.JasperListener"></Listener>

                 <Listener className="org.apache.catalina.core.JreMemoryLeakPreventionListener"></Listener>

                 <Listener className="org.apache.catalina.mbeans.GlobalResourcesLifecycleListener"></Listener>

                 <GlobalNamingResources>

                      <Resource description="User database that can be updated and saved" name="UserDatabase" pathname="conf/tomcat-users.xml" factory="org.apache.catalina.users.MemoryUserDatabaseFactory" type="org.apache.catalina.UserDatabase" auth="Container"></Resource>

                 </GlobalNamingResources>

                 <Service name="Catalina">

                      <Executor name="tomcatThreadPool" minSpareThreads="4" maxThreads="150" namePrefix="catalina-exec-"></Executor>

                      <Connector port="8014" protocol="AJP/1.3" redirectPort="8447" tomcatAuthentication="false" maxThreads="250" connectionTimeout="60000"></Connector>

                      <Engine jvmRoute="Instance_Codebase" name="Catalina" defaultHost="localhost">

                           <Realm className="org.apache.catalina.realm.LockOutRealm">

                                <Realm className="org.apache.catalina.realm.UserDatabaseRealm" resourceName="UserDatabase"></Realm>

                           </Realm>

                           <Host name="localhost" autoDeploy="false" unpackWARs="true" appBase="webapps">

                                <!--<Valve pattern="%h %l %u %t &quot;%r&quot; %s %b" directory="logs" prefix="localhost_access_log." className="org.apache.catalina.valves.AccessLogValve" suffix=".txt" resolveHosts="false"></Valve>-->

                           </Host>

                      </Engine>

                      <Connector port="8501" protocol="org.apache.coyote.http11.Http11NioProtocol" connectionTimeout="20000" redirectPort="8443" executor="tomcatThreadPool"></Connector>

                 </Service>

            </Server>

            • 3. Re: ColdFusion 10 instance/Tomcat dying at predictable intervals (white screen of death)
              Anit_Kumar Adobe Employee

              Hi,

               

              As mentioned on the Stack overflow post as well, there are Error 502 (Bad Gateway) and 503 (Service unavailable) alternatively. The logs still have info/error and not debug information. can you change the log level to "debug" from "info" and restart IIS.

               

              Also, your site's connector needs tuning as well. You may refer Connector Tuning. This is applicable for CF10 as well. You can enable metric logging (Debugging & Logging>Debug Output Settings)and then tune the connectors. Use the Current Thread Count as an input to the connection_pool_size and then set the max_reuse_connections.

               

              Regards,

              Anit Kumar

              • 4. Re: ColdFusion 10 instance/Tomcat dying at predictable intervals (white screen of death)
                Dink477 Level 1

                carl type3,

                 

                The size of our isapi_redirect.dll is:

                • Size: 362KB (370,688 bytes)
                • Size on disk: 364KB (372,736 bytes)

                 

                Date stamps are:

                • Created: Wednesday, December 11, 2013, 10:47:31 AM
                • Modified: Saturday, November 02, 2013, 2:12:36 PM
                • Accessed: Wednesday, December 11, 2013, 10:47:31 AM

                 

                ---

                 

                I re-read the post on ColdFusion 10 Update 13 and do see where it says:

                2. After applying the update, configure/reconfigure the connector with the external web server(Apache) using wsconfig tool. It is available at {cf_install_home}/{instance_name}/runtime/bin.

                I don't think our connector was reconfigured when this update was installed (which was on: Fri, 17 Jan 2014 15:46:52 -0500).

                 

                A quick web search into this issue you raised brought up some resources like:

                 

                 

                I think we are going to build our connector for this site and see if that helps.

                 

                What do you think, does this sound like it could be the issue?

                • 5. Re: ColdFusion 10 instance/Tomcat dying at predictable intervals (white screen of death)
                  carl type3 Level 4

                  The size and date size of isapi_redirect.dll show it is patched up to date.

                   

                  I know it means work but I think you could benefit by monitoring tomcat using Java JMX settings along with JDK tool Jconsole.

                   

                  Regards, Carl.

                  • 6. Re: ColdFusion 10 instance/Tomcat dying at predictable intervals (white screen of death)
                    Dink477 Level 1

                    Thanks everyone for the input and assistance. As of today, we’ve been running WSOD free for 4+ days and counting.

                     

                    We are still not sure what kicked off the problem, it might have just been a tipping point in web traffic, but I am pretty confident we have it under control now.

                     

                    In large part I believe it was an issue of connector tuning.

                     

                    ---

                     

                    By default, when a connector is created using the Web Service Configuration Tool (wsconfig.exe) the connection pool is set to 250 connections, but this is not reflected in the server.xml configuration by default as well.  We changed the AJP/1.3 connector to specify a matching max threads value as well as added a 60 second connection timeout as they are indefinite otherwise.

                     

                    We also adjusted the workers.properties file to specify the connection_pool_size and the connection_pool_timeout to match as well.

                     

                    The previous default settings seemed to match up with the isapi_redirect.log where we would see that every time we got right about to 200 connections tomcat would stall.  Matching up all these setting seem to help.

                     

                    After the configurations changes, we deleted and the recreated the connector itself from the instance. This way we are 100% sure that the connector is up to date with the latest changes from all the Server Updates.

                     

                    We also then restarted the website in IIS, but we had to ensure that the w3wp.exe process for the instance was reset as well (we killed the process and let it restart).

                     

                    Then we brought everything back up and have not had any problems since.

                     

                    ---

                     

                    Thanks again for the assistance both here and on Stack Overflow; it helped us focus in on some of our issues. I’ll be sure to update this post if any other information comes to light.  I’m pretty sure these steps will help anyone having connector/tomcat performance issues.

                     

                    Here are some of the great resources we were able to find that helped us out a lot:

                     

                     

                    ---

                     

                    And finally, here’s a summary of the changes and steps we made to clear the problem up:

                     

                    ---

                     

                    1.) server.xml

                     

                        Changed

                    <Connector port="8014" protocol="AJP/1.3" redirectPort="8446" tomcatAuthentication="false">

                        to

                    <Connector port="8014" protocol="AJP/1.3" redirectPort="8447" tomcatAuthentication="false" maxThreads="250" connectionTimeout="60000">

                     

                    ---

                     

                    2.) workers.properties

                     

                        Set (to ensure it matched our # of connections)

                    worker.Instance_Codebase.max_reuse_connections=250


                        Added lines

                    worker.Instance_Codebase.connection_pool_size=250
                    worker.Instance_Codebase.connection_pool_timeout=60

                     

                    ---

                     

                    3.) Deleted the existing connector, then re-created it using the Web Server Configuration Tool (wsconfig.exe) for the instance (Be sure to Run As Administrator!).

                     

                        Also note that rebuilding the connector will likely require you to reenter the above changes to your workers.properties file.

                     

                    ---

                     

                    4.) Restart the IIS site, which included ensuring that the w3wp.exe process for the site is stopped/killed and restarted.

                     

                    ---

                     

                    5.) Start the instance and IIS site back up.

                    • 7. Re: ColdFusion 10 instance/Tomcat dying at predictable intervals (white screen of death)
                      C.Weissleder

                      Is there a specific reason why you changed the redirectPort of the connector in the server.xml?

                       

                       

                      1.) server.xml

                       

                          Changed

                      <Connector port="8014" protocol="AJP/1.3" redirectPort="8446" tomcatAuthentication="false">

                          to

                      <Connector port="8014" protocol="AJP/1.3" redirectPort="8447" tomcatAuthentication="false" maxThreads="250" connectionTimeout="60000">

                       

                       

                      Thanks & Greetings,

                      Christian

                      • 8. Re: ColdFusion 10 instance/Tomcat dying at predictable intervals (white screen of death)
                        Dink477 Level 1

                        I believe that when we deleted and then re-built the connector the redirectPort value was changed. It is definitely not something we specifically changed.