Expand my Community achievements bar.

HashMap$AbstractMapIterator.hasNext() eating 20% of CPU cycles LCDS4.6

Avatar

Level 1

We are doing some load testing on the new LCDS4.6 with RTMP channel and we noticed some pretty high CPU usage here from this method

flex.messaging.socketserver.Reactor$ReactorImpl.run()

The cause seems to be a call to HashMap$AbstractMapIterator.hasNext().

Any ideeas?

PS: LCDS3.1 did not have this problem.

6 Replies

Avatar

Employee

Hi,

Can you attach stack trace etc. that can provide more details to us?

Rohit

Avatar

Level 1

Hi, please see attached.

Thread Name WorkManager.LCDSWorkManager : 2-rtmp-server-Reactor3
State Waiting on condition
Java Stack at java/util/HashMap$AbstractMapIterator.hasNext(HashMap.java:112(Compiled Code))
at flex/messaging/socketserver/Reactor$ReactorImpl.run(Reactor.java:427(Compiled Code))
at flex/messaging/util/concurrent/AsynchBeansWorkManagerExecutor$WorkCommandWrapper.run(AsynchBeansWorkManagerExecutor.java:186)
at com/ibm/ws/asynchbeans/J2EEContext$RunProxy.run(J2EEContext.java:268)
at java/security/AccessController.doPrivileged(AccessController.java:224)
at javax/security/auth/Subject.doAs(Subject.java:495)
at com/ibm/websphere/security/auth/WSSubject.doAs(WSSubject.java:131)
at com/ibm/websphere/security/auth/WSSubject.doAs(WSSubject.java:89)
at com/ibm/ws/asynchbeans/J2EEContext$DoAsProxy.run(J2EEContext.java:335)
at java/security/AccessController.doPrivileged(AccessController.java:251(Compiled Code))
at com/ibm/ws/asynchbeans/J2EEContext.run(J2EEContext.java:777)
at com/ibm/ws/asynchbeans/WorkWithExecutionContextImpl.go(WorkWithExecutionContextImpl.java:222)
at com/ibm/ws/asynchbeans/ABWorkItemImpl.run(ABWorkItemImpl.java:159)
at com/ibm/ws/util/ThreadPool$Worker.run(ThreadPool.java:1604)
Native Stack (0x00002AAAAB06D7A2 [libj9prt24.so+0xe7a2])
(0x00002AAAAB077BF1 [libj9prt24.so+0x18bf1])
(0x00002AAAAB06D82D [libj9prt24.so+0xe82d])
(0x00002AAAAB06D93A [libj9prt24.so+0xe93a])
(0x00002AAAAB06D5E4 [libj9prt24.so+0xe5e4])
(0x00002AAAAB077BF1 [libj9prt24.so+0x18bf1])
(0x00002AAAAB06D65D [libj9prt24.so+0xe65d])
(0x00002AAAAB068E5F [libj9prt24.so+0x9e5f])
(0x0000003849C0EB10 [libpthread.so.0+0xeb10])
pthread_cond_wait+0xb9 (0x0000003849C0AEE9 [libpthread.so.0+0xaee9])
(0x00002AAAAAE5460B [libj9thr24.so+0x660b])
(0x00002AAAAAD25009 [libj9vm24.so+0x51009])
(0x00002AAAAAD2502E [libj9vm24.so+0x5102e])
(0x00002AAAAACDE2B9 [libj9vm24.so+0xa2b9])

Avatar

Employee

We have recently checked in a fix to the Reactor to work around a JDK bug. The symptom of that bug is also the same i.e. CPU usage spike, however the stack is different as shown below. Also, I don't think a waiting thread (as in your stack above) can consume CPU resources therefore please look for threads that are alive and running (not blocked or waiting) in your stack trace. The only place in ReactorImpl.run() that has an iterator is when we loop over the selected keys of the selector. That call is smack in the middle of the loop that calls select() in an infinite loop (due to the JDK bug), so maybe your profiler is showing the iterator as a hotspot. This is just a guess.

Take a look at the stack that's been reported to us:

"neo-nio-server-Reactor1" id=157 idx=0x264 tid=22394 prio=5 alive, in native, daemon

    at sun/nio/ch/EPollArrayWrapper.epollWait(JIJI)I(Native Method)

    at sun/nio/ch/EPollArrayWrapper.poll(EPollArrayWrapper.java:210)[inlined]

    at sun/nio/ch/EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)[inlined]

    at sun/nio/ch/SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)[inlined]

    at sun/nio/ch/SelectorImpl.select(SelectorImpl.java:80)[inlined]

    at sun/nio/ch/SelectorImpl.select(SelectorImpl.java:84)[optimized]

    ^-- Holding lock: sun/nio/ch/Util$2@0x1157a6600[biased lock]

    ^-- Holding lock: java/util/Collections$UnmodifiableSet@0x1157a65f0[biased lock]

    ^-- Holding lock: sun/nio/ch/EPollSelectorImpl@0x1157a58b0[biased lock]

    at flex/messaging/socketserver/Reactor$ReactorImpl.run(Reactor.java:523)

    at java/lang/Thread.run(Thread.java:662)

    at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)

    -- end of trace

Turns out it's an infamous JDK bug in epollWait() where the code doesn't block when it's supposed to.

The following links help to better understand the problem and the workaround:

https://issues.apache.org/jira/browse/DIRMINA-678

https://github.com/netty/netty/issues/327

https://github.com/netty/netty/pull/565 (workaround)

http://bugs.sun.com/view_bug.do?bug_id=6403933 (official Oracle JDK bug)

https://forums.oracle.com/forums/thread.jspa?threadID=2424401&tstart=0 (unanswered)

http://www.java.net/node/670383 (another app [orbd] affected by the same issue)

If you wish to obtain this patch, please open a ticket with support. We haven't qualified this patch yet so I won't completely rule out changes to the workaround or vouch for its efficacy.

Avatar

Level 1

We are just evaluating this product, can you please provide the patch here?

Avatar

Employee

Not sure I can. Please send me a PM with your email and company name and I'll have someone from Product Management follow up with you.