1 Reply Latest reply on Dec 16, 2009 8:03 AM by Zvranic

    Database failover recovery

    Zvranic

      For our customers we built robust architecture with LC in horizonal cluster and MS SQL Server database running in Windows cluster (active-passive configuration).

       

      While we were testing database failover (moving all db services from one node to another) some strange things started to happen.

      After services were down on one server, LC started to generate a lot of errors in server log. After services were up again on another server errors continued. For 50 minutes! During all that time contenspace application was inaccesible - reporting error (on both servers in a cluster when accesed individuallly).

      Result was the same if we just took db down and up again on the same db server in a cluster. Our LC is simply unable to make gracefull recovery if its database connections are dropped at some point.

       

      During that time server.log grew to 200+MB.

       

      Errors that were generated pretty much followed the same pattern.

       

       

      2009-12-16 15:23:06,729 WARN  [org.hibernate.util.JDBCExceptionReporter] SQL Error: 0, SQLState: null

      2009-12-16 15:23:06,729 ERROR [org.hibernate.util.JDBCExceptionReporter] The connection is closed.

      2009-12-16 15:23:06,729 ERROR [org.hibernate.transaction.JDBCTransaction] JDBC rollback failed

      com.microsoft.sqlserver.jdbc.SQLServerException: The connection is closed.

      at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDriverError(Unknown Source)

      at com.microsoft.sqlserver.jdbc.SQLServerConnection.checkClosed(Unknown Source)

      at com.microsoft.sqlserver.jdbc.SQLServerConnection.rollback(Unknown Source)

      at org.jboss.resource.adapter.jdbc.BaseWrapperManagedConnection.jdbcRollback(BaseWrapperMana gedConnection.java:448)

      at org.jboss.resource.adapter.jdbc.WrappedConnection.rollback(WrappedConnection.java:337)

      at org.hibernate.transaction.JDBCTransaction.rollbackAndResetAutoCommit(JDBCTransaction.java :183)

      at org.hibernate.transaction.JDBCTransaction.rollback(JDBCTransaction.java:162)

      at org.springframework.orm.hibernate3.HibernateTransactionManager.doRollback(HibernateTransa ctionManager.java:581)

      at org.springframework.transaction.support.AbstractPlatformTransactionManager.processRollbac k(AbstractPlatformTransactionManager.java:745)

      at org.springframework.transaction.support.AbstractPlatformTransactionManager.rollback(Abstr actPlatformTransactionManager.java:722)

      at org.springframework.transaction.interceptor.TransactionAspectSupport.completeTransactionA fterThrowing(TransactionAspectSupport.java:325)

      at org.alfresco.util.transaction.SpringAwareUserTransaction.rollback(SpringAwareUserTransact ion.java:498)

      at org.alfresco.repo.transaction.RetryingTransactionHelper.doInTransaction(RetryingTransacti onHelper.java:346)

      at org.alfresco.repo.node.index.IndexTransactionTracker.reindexImpl(IndexTransactionTracker. java:203)

      at org.alfresco.repo.node.index.AbstractReindexComponent$1.execute(AbstractReindexComponent. java:252)

      at org.alfresco.repo.node.index.AbstractReindexComponent.reindex(AbstractReindexComponent.ja va:262)

      at org.alfresco.repo.node.index.IndexRecoveryJob.execute(IndexRecoveryJob.java:40)

      at org.quartz.core.JobRunShell.run(JobRunShell.java:202)

      at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:529)

      2009-12-16 15:23:06,729 ERROR [org.alfresco.util.transaction.SpringAwareUserTransaction] Application exception overridden by rollback exception

      java.lang.Exception

      at org.alfresco.util.transaction.SpringAwareUserTransaction.rollback(SpringAwareUserTransact ion.java:498)

      at org.alfresco.repo.transaction.RetryingTransactionHelper.doInTransaction(RetryingTransacti onHelper.java:346)

      at org.alfresco.repo.node.index.IndexTransactionTracker.reindexImpl(IndexTransactionTracker. java:203)

      at org.alfresco.repo.node.index.AbstractReindexComponent$1.execute(AbstractReindexComponent. java:252)

      at org.alfresco.repo.node.index.AbstractReindexComponent.reindex(AbstractReindexComponent.ja va:262)

      at org.alfresco.repo.node.index.IndexRecoveryJob.execute(IndexRecoveryJob.java:40)

      at org.quartz.core.JobRunShell.run(JobRunShell.java:202)

      at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:529)

      2009-12-16 15:23:06,729 ERROR [org.alfresco.repo.transaction.RetryingTransactionHelper] Rollback failure.  Normal retry behaviour will resume.

      org.springframework.transaction.TransactionSystemException: Could not roll back Hibernate transaction; nested exception is org.hibernate.TransactionException: JDBC rollback failed

      Caused by:

      org.hibernate.TransactionException: JDBC rollback failed

      at org.hibernate.transaction.JDBCTransaction.rollback(JDBCTransaction.java:170)

      at org.springframework.orm.hibernate3.HibernateTransactionManager.doRollback(HibernateTransa ctionManager.java:581)

      at org.springframework.transaction.support.AbstractPlatformTransactionManager.processRollbac k(AbstractPlatformTransactionManager.java:745)

      at org.springframework.transaction.support.AbstractPlatformTransactionManager.rollback(Abstr actPlatformTransactionManager.java:722)

      at org.springframework.transaction.interceptor.TransactionAspectSupport.completeTransactionA fterThrowing(TransactionAspectSupport.java:325)

      at org.alfresco.util.transaction.SpringAwareUserTransaction.rollback(SpringAwareUserTransact ion.java:498)

      at org.alfresco.repo.transaction.RetryingTransactionHelper.doInTransaction(RetryingTransacti onHelper.java:346)

      at org.alfresco.repo.node.index.IndexTransactionTracker.reindexImpl(IndexTransactionTracker. java:203)

      at org.alfresco.repo.node.index.AbstractReindexComponent$1.execute(AbstractReindexComponent. java:252)

      at org.alfresco.repo.node.index.AbstractReindexComponent.reindex(AbstractReindexComponent.ja va:262)

      at org.alfresco.repo.node.index.IndexRecoveryJob.execute(IndexRecoveryJob.java:40)

      at org.quartz.core.JobRunShell.run(JobRunShell.java:202)

      at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:529)

      Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: The connection is closed.

      at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDriverError(Unknown Source)

      at com.microsoft.sqlserver.jdbc.SQLServerConnection.checkClosed(Unknown Source)

      at com.microsoft.sqlserver.jdbc.SQLServerConnection.rollback(Unknown Source)

      at org.jboss.resource.adapter.jdbc.BaseWrapperManagedConnection.jdbcRollback(BaseWrapperMana gedConnection.java:448)

      at org.jboss.resource.adapter.jdbc.WrappedConnection.rollback(WrappedConnection.java:337)

      at org.hibernate.transaction.JDBCTransaction.rollbackAndResetAutoCommit(JDBCTransaction.java :183)

      at org.hibernate.transaction.JDBCTransaction.rollback(JDBCTransaction.java:162)

      ... 12 more

      2009-12-16 15:23:06,729 WARN  [org.hibernate.jdbc.AbstractBatcher] exception clearing maxRows/queryTimeout

      com.microsoft.sqlserver.jdbc.SQLServerException: The connection is closed.

      at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDriverError(Unknown Source)

      at com.microsoft.sqlserver.jdbc.SQLServerConnection.checkClosed(Unknown Source)

      at com.microsoft.sqlserver.jdbc.SQLServerStatement.checkClosed(Unknown Source)

      at com.microsoft.sqlserver.jdbc.SQLServerStatement.getMaxRows(Unknown Source)

      at org.jboss.resource.adapter.jdbc.CachedPreparedStatement.getMaxRows(CachedPreparedStatemen t.java:352)

      at org.jboss.resource.adapter.jdbc.WrappedStatement.getMaxRows(WrappedStatement.java:244)

      at org.hibernate.jdbc.AbstractBatcher.closeQueryStatement(AbstractBatcher.java:272)

      at org.hibernate.jdbc.AbstractBatcher.closeQueryStatement(AbstractBatcher.java:209)

      at org.hibernate.loader.Loader.getResultSet(Loader.java:1792)

      at org.hibernate.loader.Loader.doQuery(Loader.java:662)

      at org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:224)

      at org.hibernate.loader.Loader.doList(Loader.java:2211)

      at org.hibernate.loader.Loader.listIgnoreQueryCache(Loader.java:2095)

      at org.hibernate.loader.Loader.list(Loader.java:2090)

      at org.hibernate.loader.hql.QueryLoader.list(QueryLoader.java:388)

      at org.hibernate.hql.ast.QueryTranslatorImpl.list(QueryTranslatorImpl.java:338)

      at org.hibernate.engine.query.HQLQueryPlan.performList(HQLQueryPlan.java:172)

      at org.hibernate.impl.SessionImpl.list(SessionImpl.java:1121)

      at org.hibernate.impl.QueryImpl.list(QueryImpl.java:79)

      at org.alfresco.repo.avm.hibernate.AVMStoreDAOHibernate.getAll(AVMStoreDAOHibernate.java:89)

      at org.alfresco.repo.avm.AVMRepository.getAVMStores(AVMRepository.java:1088)

      at org.alfresco.repo.avm.AVMServiceImpl.getStores(AVMServiceImpl.java:876)

      at sun.reflect.GeneratedMethodAccessor312.invoke(Unknown Source)

      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

      at java.lang.reflect.Method.invoke(Method.java:585)

      at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:281 )

      at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMe thodInvocation.java:187)

      at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvo cation.java:154)

      at org.alfresco.repo.transaction.SingleEntryTransactionResourceInterceptor.invokeInternal(Si ngleEntryTransactionResourceInterceptor.java:163)

      at org.alfresco.repo.transaction.SingleEntryTransactionResourceInterceptor.invoke(SingleEntr yTransactionResourceInterceptor.java:138)

      at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvo cation.java:176)

      at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:210)

      at $Proxy50.getStores(Unknown Source)

      at org.alfresco.repo.node.index.AVMRemoteSnapshotTracker.processStores(AVMRemoteSnapshotTrac ker.java:73)

      at org.alfresco.repo.node.index.AVMRemoteSnapshotTracker.reindexImpl(AVMRemoteSnapshotTracke r.java:64)

      at org.alfresco.repo.node.index.AbstractReindexComponent$1.execute(AbstractReindexComponent. java:252)

      at org.alfresco.repo.transaction.RetryingTransactionHelper.doInTransaction(RetryingTransacti onHelper.java:278)

      at org.alfresco.repo.transaction.RetryingTransactionHelper.doInTransaction(RetryingTransacti onHelper.java:227)

      at org.alfresco.repo.node.index.AbstractReindexComponent.reindex(AbstractReindexComponent.ja va:258)

      at org.alfresco.repo.node.index.IndexRecoveryJob.execute(IndexRecoveryJob.java:40)

      at org.quartz.core.JobRunShell.run(JobRunShell.java:202)

      at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:529)

       

       

      Is that maybe expected behaviour? Or it could be something in our architecture/configuration? Maybe tweaking some of datasource properties.

      Our system is otherwise stable, and we have several workflows running without problems.

       

      Any suggestion is appreciated!

       

      Zeljko Vranic