Bug 1276

Summary:	MT_Timeout test crashes
Product:	TAO	Reporter:	Nanbor Wang <bala>
Component:	ORB	Assignee:	Nanbor Wang <bala>
Status:	RESOLVED FIXED
Severity:	normal	CC:	duane.binder, steve.vranyes, tao-support
Priority:	P3
Version:	1.2.5
Hardware:	All
OS:	All
Bug Depends on:
Bug Blocks:	1277

Description Nanbor Wang 2002-08-09 23:46:32 CDT

I have spent enough time looking at this crash. This is just a place holder and 
a reminder. I see two crashes (run the test continously for a long time in a 
for loop) and here are the stack traces
-------------------------------------------------------------------
#0  ACE_Intrusive_List_Node<TAO_LF_Follower>::next (this=0x554f454d)
    at /project/charanga/bala/ACE_wrappers/ace/Intrusive_List_Node.inl:18
#1  0x4021ffbf in ACE_Intrusive_List<TAO_LF_Follower>::remove (this=0x41014ac4,
    node=0x41014a60) 
at /project/charanga/bala/ACE_wrappers/ace/Intrusive_List.cpp:92
#2  0x402df106 in TAO_Leader_Follower::remove_follower (this=0x41014a98,
    follower=0x41014a60)
    at /project/charanga/bala/ACE_wrappers/TAO/tao/Leader_Follower.i:179
#3  0x402dea44 in TAO_LF_Follower::signal (this=0x41014a60) at 
LF_Follower.cpp:33
#4  0x402e2dba in TAO_LF_Event::state_changed (this=0xbf7ff640, new_state=3)
    at LF_Event.cpp:43
#5  0x402f1321 in TAO_Synch_Reply_Dispatcher::dispatch_reply (this=0xbf7ff638,
    params=@0xbeffe978) at Synch_Reply_Dispatcher.cpp:77
#6  0x402ef5d8 in TAO_Muxed_TMS::dispatch_reply (this=0x40e0a9f0, 
params=@0xbeffe978)
    at Muxed_TMS.cpp:123
#7  0x401e53ae in TAO_Transport::process_parsed_messages (this=0x40e0a910,
    qd=0xbeffea48, rh=@0xbeffef08) at Transport.cpp:1938
#8  0x401e4054 in TAO_Transport::handle_input_i (this=0x40e0a910, 
rh=@0xbeffef08,
    max_wait_time=0x0) at Transport.cpp:1389
#9  0x402108cf in TAO_IIOP_Connection_Handler::handle_input (this=0x4100b260, 
h=11)
    at IIOP_Connection_Handler.cpp:378
#10 0x406ebee3 in ACE_TP_Reactor::dispatch_socket_event (this=0x8085128,
    dispatch_info=@0xbeffefac) at TP_Reactor.cpp:778
#11 0x406eb789 in ACE_TP_Reactor::handle_socket_events (this=0x8085128,
    event_count=@0xbeffeff0, guard=@0xbefff02c) at TP_Reactor.cpp:555
#12 0x406eb459 in ACE_TP_Reactor::dispatch_i (this=0x8085128, 
max_wait_time=0xbefff618,
    guard=@0xbefff02c) at TP_Reactor.cpp:383
#13 0x406ea9e5 in ACE_TP_Reactor::handle_events (this=0x8085128,
    max_wait_time=0xbefff618) at TP_Reactor.cpp:174
#14 0x406e675b in ACE_Reactor::handle_events (this=0x8085118, 
max_wait_time=0xbefff618)
    at /project/charanga/bala/ACE_wrappers/ace/Reactor.i:172
#15 0x402e0e0e in TAO_Leader_Follower::wait_for_event (this=0x80850a8,
    event=0xbefff640, transport=0x4100b408, max_wait_time=0xbefff618)
    at Leader_Follower.cpp:373
#16 0x402eddec in TAO_Wait_On_Leader_Follower::wait (this=0x4100b4a8,
    max_wait_time=0xbefff618, rd=@0xbefff638) at Wait_On_Leader_Follower.cpp:57
#17 0x402a64ee in TAO_GIOP_Synch_Invocation::invoke_i (this=0xbefff364,
    is_locate_request=0 '\0') at Invocation.cpp:636
#18 0x402a72eb in TAO_GIOP_Twoway_Invocation::invoke (this=0xbefff364, 
excepts=0x0,
    except_count=0) at Invocation.cpp:845
#19 0x080535a4 in Test::_TAO_Sleep_Service_Remote_Proxy_Impl::go_to_sleep (
    this=0x8062664, _collocated_tao_target_=0x807ddcc, microseconds=50000)
    at TestC.cpp:551
#20 0x08054ad3 in Test::Sleep_Service::go_to_sleep (this=0x807ddc0, 
microseconds=50000)
    at TestC.cpp:1053
#21 0x08051b2e in Client_Task::one_iteration (this=0xbffff6a4) at 
Client_Task.cpp:147
#22 0x080516d9 in Client_Task::svc (this=0xbffff6a4) at Client_Task.cpp:87
#23 0x4074722f in ACE_Task_Base::svc_run (args=0xbffff6a4) at Task.cpp:203
#24 0x406bf31c in ACE_Thread_Adapter::invoke_i (this=0x40d025e8)
    at Thread_Adapter.cpp:150
#25 0x406bf20e in ACE_Thread_Adapter::invoke (this=0x40d025e8) at 
Thread_Adapter.cpp:93
#26 0x40665576 in ace_thread_adapter (args=0x40d025e8) at 
Base_Thread_Adapter.cpp:121
#27 0x4085e0ba in pthread_start_thread () from /lib/libpthread.so.0

----------------------------------------------------------------------
#0  0x00000000 in ?? ()
#1  0x408b0d14 in __si_type_info::dcast () from /usr/lib/libstdc++-libc6.2-
2.so.3
#2  0x408b198a in __dynamic_cast () from /usr/lib/libstdc++-libc6.2-2.so.3
#3  0x408b13a1 in __throw_type_match_rtti () from /usr/lib/libstdc++-libc6.2-
2.so.3
#4  0x408af374 in __check_eh_spec () from /usr/lib/libstdc++-libc6.2-2.so.3
#5  0x402a70fd in TAO_GIOP_Synch_Invocation::invoke_i (this=0xbe9ff364,
    is_locate_request=0 '\0') at Invocation.cpp:753
#6  0x402a72eb in TAO_GIOP_Twoway_Invocation::invoke (this=0xbe9ff364, 
excepts=0x0,
    except_count=0) at Invocation.cpp:845
#7  0x080535a4 in Test::_TAO_Sleep_Service_Remote_Proxy_Impl::go_to_sleep (
    this=0x8062664, _collocated_tao_target_=0x807ddcc, microseconds=50000)
    at TestC.cpp:551
#8  0x08054ad3 in Test::Sleep_Service::go_to_sleep (this=0x807ddc0, 
microseconds=50000)
    at TestC.cpp:1053
#9  0x08051b2e in Client_Task::one_iteration (this=0xbffff6a4) at 
Client_Task.cpp:147
#10 0x080516d9 in Client_Task::svc (this=0xbffff6a4) at Client_Task.cpp:87
#11 0x4074722f in ACE_Task_Base::svc_run (args=0xbffff6a4) at Task.cpp:203
#12 0x406bf31c in ACE_Thread_Adapter::invoke_i (this=0x80a1ba8)
    at Thread_Adapter.cpp:150
#13 0x406bf20e in ACE_Thread_Adapter::invoke (this=0x80a1ba8) at 
Thread_Adapter.cpp:93
#14 0x40665576 in ace_thread_adapter (args=0x80a1ba8) at 
Base_Thread_Adapter.cpp:121
#15 0x4085e0ba in pthread_start_thread () from /lib/libpthread.so.0
-------------------------------------------------------------------

Need to look when I have time after  reach vanderbilt!

Comment 1 Nanbor Wang 2002-08-09 23:46:54 CDT

Accepted

Comment 2 Carlos O'Ryan 2002-08-10 21:21:42 CDT

Crashes are blockers for the 1.3 release.

Comment 3 Nanbor Wang 2002-11-19 13:57:00 CST

Appears with the latest version too. Hence updated the version info!

Okay, here is the scoop. Looks to me that the Roundtrip Timeout implementation
is   fragile and hence the problems. Imagine this scenario

1. 2 threads (T1 and T2) send messages (they also have a RT timeout at the ORB
level say) and wait for the reply. 

2. Say T1 gets to be the leader and drives the reactor event loop while T2 is a
follower. 

3. T1 gets a reply which actually happens to be T2's reply 

4. But at this point T2 timesout and starts unwinding its stack

5. T1 after collecting the reply tries to get the dispatcher from the table. And
if it gets tries dispatch_reply () on it.

6. Since the reply dispatcher was created on stack, when T2 unwinds the stack is
blown off and T1 is left with pointers that are useless.

7. Things are a lot more screwy with Exclusive connection strategy.

Looks like that portion of the ORB is very very fragile!

Comment 4 Nanbor Wang 2002-12-24 07:03:28 CST

Fixed! The relevant changelog entries are 

Sun Dec 22 11:26:30 2002  Balachandran Natarajan  <bala@isis-server.isis.vanderb
ilt.edu>
Mon Dec 23 22:33:56 2002  Balachandran Natarajan  <bala@isis-server.isis.vanderb
ilt.edu>