Please report new issues athttps://github.com/DOCGroup
This report describes the deadlock discovered in an alternate version of the 'ACE_wrappers/TAO/tests/Big_Request_Muxing' test. The tarred code for this was sent out on the devo-group. Hopefully it will be incorporated into the DOC repo, else I will cut paste the relevant changes into the ticket. Broadly in this modified test, the client spawns 6 threads. Each thread pair modifies it Messaging::SYNC_SCOPE_POLICY_TYPE to Messaging::SYNC_WITH_TARGET, Messaging::SYNC_WITH_TARGET & Messaging::SYNC_NONE respectively. Then each thread invokes more_data() 1000 times. The deadlock happens on one of the Messaging::SYNC_NONE threads. Heres the stack trace: #0 0x4093da27 in select () from /lib/tls/libc.so.6 #1 0x40620099 in ACE_OS::select (width=10, rfds=0x807a8dc, wfds=0x0, efds=0x0, timeout=0x0) at OS_NS_sys_select.inl:39 #2 0x40634ed0 in ACE_Select_Reactor_T<ACE_Select_Reactor_Token_T<ACE_Token> >::wait_for_multiple_events (this=0x807a3d0, dispatch_set=@0x807a8d0, max_wait_time=0x0) at Select_Reactor_T.cpp:1142 #3 0x4064403f in ACE_TP_Reactor::get_event_for_dispatching (this=0x807a3d0, max_wait_time=0x0) at TP_Reactor.cpp:556 #4 0x40643a85 in ACE_TP_Reactor::dispatch_i (this=0x807a3d0, max_wait_time=0x0, guard=@0x429b2330) at TP_Reactor.cpp:250 #5 0x406438e6 in ACE_TP_Reactor::handle_events (this=0x807a3d0, max_wait_time=0x0) at TP_Reactor.cpp:172 #6 0x4063f91d in ACE_Reactor::handle_events (this=0x807ad20, max_wait_time=0x0) at Reactor.inl:166 #7 0x4043f860 in TAO_ORB_Core::run (this=0x806d580, tv=0x0, perform_work=1) at ORB_Core.cpp:1878 #8 0x4044a133 in TAO_Leader_Follower_Flushing_Strategy::flush_transport (this=0x807b5d8, transport=0x80816d8) at Leader_Follower_Flushing_Strategy.cpp:55 #9 0x40373711 in TAO_Transport::send_message_shared_i (this=0x80816d8, stub=0x807d1c8, message_semantics=0, message_block=0x8087cc0, max_wait_time=0x0) at Transport.cpp:1187 #10 0x4038433f in TAO_IIOP_Transport::send_message_shared (this=0x80816d8, stub=0x807d1c8, message_semantics=0, message_block=0x8087cc0, max_wait_time=0x0) at IIOP_Transport.cpp:187 #11 0x4038424e in TAO_IIOP_Transport::send_message (this=0x80816d8, stream=@0x8087cc0, stub=0x807d1c8, message_semantics=0, max_wait_time=0x0) at IIOP_Transport.cpp:156 #12 0x403841bf in TAO_IIOP_Transport::send_request (this=0x80816d8, stub=0x807d1c8, orb_core=0x806d580, stream=@0x8087cc0, message_semantics=0, max_wait_time=0x0) at IIOP_Transport.cpp:133 #13 0x40405b54 in TAO::Remote_Invocation::send_message (this=0x429b2670, cdr=@0x8087cc0, message_semantics=0, max_wait_time=0x0) at Remote_Invocation.cpp:165 #14 0x40407b3a in TAO::Synch_Oneway_Invocation::remote_oneway (this=0x429b2670, max_wait_time=0x0) at Synch_Invocation.cpp:715 #15 0x40404377 in TAO::Invocation_Adapter::invoke_oneway (this=0x429b2870, details=@0x429b27e0, effective_target=@0x429b27a0, r=@0x429b2720, max_wait_time=@0x429b2798) at Invocation_Adapter.cpp:342 #16 0x4040410b in TAO::Invocation_Adapter::invoke_remote_i (this=0x429b2870, stub=0x807d1c8, details=@0x429b27e0, effective_target=@0x429b27a0, max_wait_time=@0x429b2798) at Invocation_Adapter.cpp:268 #17 0x40403bc5 in TAO::Invocation_Adapter::invoke_i (this=0x429b2870, stub=0x807d1c8, details=@0x429b27e0) at Invocation_Adapter.cpp:86 #18 0x40403ab8 in TAO::Invocation_Adapter::invoke (this=0x429b2870, ex_data=0x0, ex_count=0) at Invocation_Adapter.cpp:45 #19 0x0804fa4b in Test::Payload_Receiver::more_data (this=0x805d5d8, the_payload=@0x429b2900) at TestC.cpp:255 #20 0x080525a0 in Client_Task::validate_connection (this=0xbffff5b0) at Client_Task.cpp:97 #21 0x0805210b in Client_Task::svc (this=0xbffff5b0) at Client_Task.cpp:43 #22 0x406cc487 in ACE_Task_Base::svc_run (args=0xbffff5b0) at Task.cpp:204 #23 0x406899c6 in ACE_Thread_Adapter::invoke_i (this=0x807cac0) at Thread_Adapter.cpp:150 #24 0x406898ff in ACE_Thread_Adapter::invoke (this=0x807cac0) at Thread_Adapter.cpp:94 #25 0x4064f262 in ace_thread_adapter (args=0x807cac0) at Base_Thread_Adapter.cpp:132 #26 0x40748b63 in start_thread () from /lib/tls/libpthread.so.0 #27 0x4094418a in clone () from /lib/tls/libc.so.6 As can be seen the client thread is waiting for events on the select. What normally happens is this: - The transport in TAO_Transport::send_message_shared_i() upon detecting that its queue needs to be flushed sets the ACE_Event_Handler::WRITE_MASK for its event handler in TAO_Transport::schedule_output_i(). - It then calls flush_transport() on its flushing strategy, which goes into the select and upon receiving the WRITE_MASK trigger will flush out the queue. However in this deadlock, as can be seen the select has NULL for the write masks. Somehow this mask got cleared before the thread entered select. Since there is no event trigger, the thread deadlocks. I am debugging this for now and could use any help/advise. Ciju
I believe this issue is no longer valid. Marking it as invalid. Ciju
John, is the changes regression you mention in svn? I am reopening this, I am not sure if you else get my question? If the regression is not in svn, could you add it for this bug?
The said modifications to Big_Request_Muxing test were added in: Thu Sep 1 16:56:12 2005 Ciju John <john_c@ociweb.com> I believe this issue may still be valid as even after recent extensive modifications from Simon Massey, there still seems to be an indication of a deadlock happening. Look at the test failure on Solaris10_Studio11_Debug from today (2007_07_25_07_29). The test failed because the server timed out waiting for the client packages. I don't believe this is related to the test timeout. However this failure is so rare and I am not sure if this really is a deadlock problem. Since the test passes on most platforms, I recommend marking this issue as resolved. Ciju