Bug 2395 - TP_Reactor invokes same eventhander multiple times
Summary: TP_Reactor invokes same eventhander multiple times
Status: RESOLVED FIXED
Alias: None
Product: ACE
Classification: Unclassified
Component: ACE Core (show other bugs)
Version: 5.4.8
Hardware: All All
: P1 critical
Assignee: Johnny Willemsen
URL:
Depends on:
Blocks: 2895
  Show dependency tree
 
Reported: 2006-02-06 07:28 CST by Johnny Willemsen
Modified: 2007-04-13 09:16 CDT (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Johnny Willemsen 2006-02-06 07:28:13 CST
We have found an issue with the TP_Reactor. It seems that sometimes the
TP_Reactor when run by multiple threads invokes the same event handler from
multiple threads, be more precise, the handle_input is called from multiple
threads when registered for read. After x.4.9 is out we will add an example to
cvs that reproduces this behaviour.
Comment 1 Johnny Willemsen 2006-02-06 08:11:16 CST
increased priority and accept
Comment 2 Johnny Willemsen 2006-02-23 03:17:16 CST
Thu Feb 23 09:16:12 UTC 2006  Johnny Willemsen  <jwillemsen@remedy.nl>

        * examples/Reactor/TP_Reactor/*:
          Example program demonstrating the TP_Reactor. This is also a
          test program for bugzilla bug 2395. For reproducing start
          the perl script but also do other activity on the system,
          then it can happen that the same event handler is invoked
          from multiple threads. Thanks to Martin Kolleck and Tino
          Riethmueller for creating this example program.
Comment 3 Johnny Willemsen 2006-03-10 09:59:23 CST
got a server crash right now. The message is slighly changed, the count is now
the thread that is in handle input already, 1 is the count, the number at the
end is our thread id.

................................................................................
                                                                            
................................................................................
                                                                            
................................................................................
                                                                            
................................................................................
                                                                            
.................Multiple invocations detected: handle 7, count 1132489056, refc
                                                                              1
1124096352.


.........ReadHandler.cpp:125: Failed to receive data.(errno = 14: Bad address)
ReadHandler.cpp:125: Failed to receive data.(errno = 14: Bad address)
...........................................ReadHandler.cpp:125: Failed to receiv
                                                                             e
data.(errno = 14: Bad address)
*** glibc detected *** ./server: free(): invalid next size (normal): 0x00002ab20
                                                                            
7d00920 ***
======= Backtrace: =========
/lib64/libc.so.6[0x2b38a2b8933e]
/lib64/libc.so.6(__libc_free+0x6c)[0x2b38a2b8a95c]
./server[0x40bcd0]
./server[0x40badf]
/home/build/ACE/gcc41/ACE_wrappers/lib/libACE.so.5.5.0(_ZN37ACE_Select_Reactor_H
                                                                            
andler_Repository6unbindEim+0x20f)[0x2b38a2202b2f]
/home/build/ACE/gcc41/ACE_wrappers/lib/libACE.so.5.5.0[0x2b38a21b32b3]
/home/build/ACE/gcc41/ACE_wrappers/lib/libACE.so.5.5.0(_ZN14ACE_TP_Reactor20hand
                                                                            
le_socket_eventsERiR18ACE_TP_Token_Guard+0xef)[0x2b38a221bfaf]
/home/build/ACE/gcc41/ACE_wrappers/lib/libACE.so.5.5.0(_ZN14ACE_TP_Reactor10disp
                                                                            
atch_iEP14ACE_Time_ValueR18ACE_TP_Token_Guard+0x82)[0x2b38a221c0d2]
/home/build/ACE/gcc41/ACE_wrappers/lib/libACE.so.5.5.0(_ZN14ACE_TP_Reactor13hand
                                                                            
le_eventsEP14ACE_Time_Value+0xb6)[0x2b38a221c1a6]
/home/build/ACE/gcc41/ACE_wrappers/lib/libACE.so.5.5.0(_ZN11ACE_Reactor22run_rea
                                                                            
ctor_event_loopEPFiPS_E+0x65)[0x2b38a21ff545]
./server(__gxx_personality_v0+0x174)[0x403a2c]
/home/build/ACE/gcc41/ACE_wrappers/lib/libACE.so.5.5.0(_ZN18ACE_Thread_Adapter6i
                                                                            
nvokeEv+0x57)[0x2b38a2213d47]
/lib64/libpthread.so.0[0x2b38a249f223]
/lib64/libc.so.6(__clone+0x6d)[0x2b38a2bdc73d]
======= Memory map: ========
00400000-00410000 r-xp 00000000 08:06 204548                             /home/b
                                                                            
uild/ACE/gcc41/ACE_wrappers/examples/Reactor/TP_Reactor/server
00510000-00511000 rw-p 00010000 08:06 204548                             /home/b
                                                                            
uild/ACE/gcc41/ACE_wrappers/examples/Reactor/TP_Reactor/server
00511000-00532000 rw-p 00511000 00:00 0                                  [heap]
40000000-40001000 ---p 40000000 00:00 0
40001000-40801000 rw-p 40001000 00:00 0
40801000-40802000 ---p 40801000 00:00 0
40802000-41002000 rw-p 40802000 00:00 0
41002000-41003000 ---p 41002000 00:00 0
41003000-41803000 rw-p 41003000 00:00 0
41803000-41804000 ---p 41803000 00:00 0
41804000-42004000 rw-p 41804000 00:00 0
42004000-42005000 ---p 42004000 00:00 0
42005000-42805000 rw-p 42005000 00:00 0
42805000-42806000 ---p 42805000 00:00 0
42806000-43006000 rw-p 42806000 00:00 0
43006000-43007000 ---p 43006000 00:00 0
43007000-43807000 rw-p 43007000 00:00 0
43807000-43808000 ---p 43807000 00:00 0
43808000-44008000 rw-p 43808000 00:00 0
44008000-44009000 ---p 44008000 00:00 0
44009000-44809000 rw-p 44009000 00:00 0
2ab1fdc22000-2ab202c23000 rw-p 2ab1fdc22000 00:00 0
2ab207d00000-2ab207d21000 rw-p 2ab207d00000 00:00 0
2ab207d21000-2ab207e00000 ---p 2ab207d21000 00:00 0
2ab207e00000-2ab20ce01000 rw-p 2ab207e00000 00:00 0
2b38a1fe2000-2b38a1ffd000 r-xp 00000000 08:06 10744                      /lib64/
                                                                            
ld-2.3.90.so
2b38a1ffd000-2b38a1fff000 rw-p 2b38a1ffd000 00:00 0
2b38a20fc000-2b38a20fe000 rw-p 0001a000 08:06 10744                      /lib64/
                                                                            
ld-2.3.90.so
2b38a20fe000-2b38a225f000 r-xp 00000000 08:06 331686                     /home/b
                                                                            
uild/ACE/gcc41/ACE_wrappers/ace/libACE.so.5.5.0
2b38a225f000-2b38a235e000 ---p 00161000 08:06 331686                     /home/b
                                                                            
uild/ACE/gcc41/ACE_wrappers/ace/libACE.so.5.5.0
2b38a235e000-2b38a2373000 rw-p 00160000 08:06 331686                     /home/b
                                                                            
uild/ACE/gcc41/ACE_wrappers/ace/libACE.so.5.5.0
2b38a2373000-2b38a2374000 rw-p 2b38a2373000 00:00 0
2b38a2395000-2b38a2397000 r-xp 00000000 08:06 10757                      /lib64/
                                                                            
libdl-2.3.90.so
2b38a2397000-2b38a2497000 ---p 00002000 08:06 10757                      /lib64/
                                                                            
libdl-2.3.90.so
2b38a2497000-2b38a2499000 rw-p 00002000 08:06 10757                      /lib64/
                                                                            
libdl-2.3.90.so
2b38a2499000-2b38a24a9000 r-xp 00000000 08:06 10777                      /lib64/
                                                                            
libpthread-2.3.90.so
2b38a24a9000-2b38a25a9000 ---p 00010000 08:06 10777                      /lib64/
                                                                            
libpthread-2.3.90.so
2b38a25a9000-2b38a25ab000 rw-p 00010000 08:06 10777                      /lib64/
                                                                            
libpthread-2.3.90.so
2b38a25ab000-2b38a25af000 rw-p 2b38a25ab000 00:00 0
2b38a25af000-2b38a25b7000 r-xp 00000000 08:06 10781                      /lib64/
                                                                            
librt-2.3.90.so
2b38a25b7000-2b38a26b6000 ---p 00008000 08Aborted
Comment 4 Johnny Willemsen 2006-03-10 10:13:22 CST
another crash. now printed at the start the tread ids

make[1]: Leaving directory
`/home/build/ACE/gcc41/ACE_wrappers/examples/Reactor/TP_Reactor'
build@elrond:~/ACE/gcc41/ACE_wrappers/examples/Reactor/TP_Reactor> ./server
Starting thread 1082132832
Starting thread 1090525536
Starting thread 1098918240
Starting thread 1107310944
Starting thread 1115703648
Starting thread 1124096352
Starting thread 1132489056
Starting thread 1140881760
Starting thread 1149274464
..............^[[A.............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................Multiple
invocations detected: handle 6, count 1107310944, refc 1 1115703648.
ReadHandler.cpp:141: We should not get here.ReadHandler.cpp:125: Failed to
receive data.(errno = 9: Bad file descriptor)
.............................*** glibc detected *** ./server: free(): invalid
next size (fast): 0x00002ab2fa5008d0 ***
======= Backtrace: =========
/lib64/libc.so.6[0x2ad42a8ae33e]
/lib64/libc.so.6(__libc_free+0x6c)[0x2ad42a8af95c]
./server[0x40bb2f]
/home/build/ACE/gcc41/ACE_wrappers/lib/libACE.so.5.5.0(_ZN37ACE_Select_Reactor_Handler_Repository6unbindEim+0x20f)[0x2ad429f27b2f]
/home/build/ACE/gcc41/ACE_wrappers/lib/libACE.so.5.5.0[0x2ad429ed82b3]
/home/build/ACE/gcc41/ACE_wrappers/lib/libACE.so.5.5.0(_ZN14ACE_TP_Reactor20handle_socket_eventsERiR18ACE_TP_Token_Guard+0xef)[0x2ad429f40faf]
/home/build/ACE/gcc41/ACE_wrappers/lib/libACE.so.5.5.0(_ZN14ACE_TP_Reactor10dispatch_iEP14ACE_Time_ValueR18ACE_TP_Token_Guard+0x82)[0x2ad429f410d2]
/home/build/ACE/gcc41/ACE_wrappers/lib/libACE.so.5.5.0(_ZN14ACE_TP_Reactor13handle_eventsEP14ACE_Time_Value+0xb6)[0x2ad429f411a6]
/home/build/ACE/gcc41/ACE_wrappers/lib/libACE.so.5.5.0(_ZN11ACE_Reactor22run_reactor_event_loopEPFiPS_E+0x65)[0x2ad429f24545]
./server(__gxx_personality_v0+0x224)[0x403adc]
/home/build/ACE/gcc41/ACE_wrappers/lib/libACE.so.5.5.0(_ZN18ACE_Thread_Adapter6invokeEv+0x57)[0x2ad429f38d47]
/lib64/libpthread.so.0[0x2ad42a1c4223]
Comment 5 Johnny Willemsen 2006-03-13 04:46:48 CST
Running with ACE_HAS_DUMP gives the output below. For handle 6 we get reinvoked,
what is really strange is that handle 6 is not in the suspend set, but in the
normal wait set

Starting thread 1082132832
Starting thread 1090525536
Starting thread 1098918240
Starting thread 1107310944
Starting thread 1115703648
Starting thread 1124096352
Starting thread 1132489056
Starting thread 1140881760
Starting thread 1149274464
...........
====
(8045|1124096352|544f1d70)
.
====
(8045|1124096352|514790)

max_size_ = 32
cur_size_ = 0
cur_limbo_= 0
ids_curr_ = 0
min_free_ = 0
heap_ =

timer_ids_ =
0       -1
1       -1
2       -1
3       -1
4       -1
5       -1
6       -1
7       -1
8       -1
9       -1
10      -1
11      -1
12      -1
13      -1
14      -1
15      -1
16      -1
17      -1
18      -1
19      -1
20      -1
21      -1
22      -1
23      -1
24      -1
25      -1
26      -1
27      -1
28      -1
29      -1
30      -1
31      -1
====

====
(8045|1124096352|544f1d78)
max_handlep1_ = 8, max_size_ = 1024
[ (event_handler = 514b50, event_handler->handle_ = -1)
 (event_handler = 516bb0, event_handler->handle_ = 5)
 (event_handler = 517ab0, event_handler->handle_ = 6)
 (event_handler = 5181a0, event_handler->handle_ = 7)
 ]====
delete_signal_handler_ = 1
read_handle = 3
read_handle = 5
read_handle = 6
read_handle_suspend = 7
restart_ = 0
requeue_position_ = -1
initialized_ = 1
owner_ = 1462938800

====
(8045|1124096352|514b50)
select_reactor_ = 544f1d70
====
(8045|1124096352|514b88)
handles_[0] = 3
handles_[1] = 4
====
====

====
(8045|1124096352|544f2490)

====
====
Multiple invocations detected: 517ab0 2 handle 6:6, count 1115703648, refc 1
1124096352.
got call counter 2
ReadHandler.cpp:144: We should not get here.ReadHandler.cpp:126: Failed to
receive data.(errno = 14: Bad address)
...........ReadHandler.cpp:126: Failed to receive data.(errno = 11: Resource
temporarily unavailable)
............
Comment 6 Johnny Willemsen 2006-03-13 05:47:54 CST
gdb thread info at the moment we have two threads in the same event handler, t2
and t8 (the top two) are both called for handle 7.

(gdb) t 2
[Switching to thread 2 (Thread 1149274464 (LWP 11188))]#0 
ACE_Select_Reactor_Handler_Repository::dump (this=0x7fff97fe4868) at
Select_Reactor_Base.cpp:520
520     ACE_Select_Reactor_Handler_Repository::dump (void) const
(gdb) back
#0  ACE_Select_Reactor_Handler_Repository::dump (this=0x7fff97fe4868) at
Select_Reactor_Base.cpp:520
#1  0x00002ba512c91614 in ACE_Select_Reactor_T<ACE_Reactor_Token_T<ACE_Token>
>::dump (this=0x7fff97fe4860) at Select_Reactor_T.cpp:1525
#2  0x000000000040cc8b in ReadHandler::handle_input (this=0x5181a0, h=7) at
ReadHandler.cpp:61
#3  0x00002ba512d050d8 in ACE_TP_Reactor::dispatch_socket_event
(this=0x7fff97fe4860, dispatch_info=<value optimized out>) at TP_Reactor.cpp:660
#4  0x00002ba512d0609f in ACE_TP_Reactor::handle_socket_events
(this=0x7fff97fe4860, event_count=@0x448080ec, guard=@0x44808130) at
TP_Reactor.cpp:511
#5  0x00002ba512d061c2 in ACE_TP_Reactor::dispatch_i (this=0x7fff97fe4860,
max_wait_time=<value optimized out>, guard=@0x44808130) at TP_Reactor.cpp:314
#6  0x00002ba512d06296 in ACE_TP_Reactor::handle_events (this=0x7fff97fe4860,
max_wait_time=0x0) at TP_Reactor.cpp:174
#7  0x00002ba512ce84b5 in ACE_Reactor::run_reactor_event_loop
(this=0x7fff97fe5020, eh=0) at Reactor.cpp:233
#8  0x0000000000403cdc in threadFunc (arg=0x7fff97fe5020) at server.cpp:28
#9  0x00002ba512cfd5b7 in ACE_Thread_Adapter::invoke (this=0x5181a0) at
Thread_Adapter.cpp:95
#10 0x00002ba512f8c223 in start_thread () from /lib64/libpthread.so.0
#11 0x00002ba5136c973d in clone () from /lib64/libc.so.6
#12 0x0000000000000000 in ?? ()
(gdb) t 8
[Switching to thread 8 (Thread 1098918240 (LWP 11182))]#0  0x00002ba512f91bfb in
__read_nocancel () from /lib64/libpthread.so.0
(gdb) back
#0  0x00002ba512f91bfb in __read_nocancel () from /lib64/libpthread.so.0
#1  0x00002ba512c87214 in ACE::recv_n_i (handle=7, buf=0x2ac6c73cc010,
len=104857600, timeout=0x511d00, bt=<value optimized out>) at OS_NS_unistd.inl:900
#2  0x000000000040cd33 in ReadHandler::handle_input (this=0x5181a0, h=7) at
ACE.inl:107
#3  0x00002ba512d050d8 in ACE_TP_Reactor::dispatch_socket_event
(this=0x7fff97fe4860, dispatch_info=<value optimized out>) at TP_Reactor.cpp:660
#4  0x00002ba512d0609f in ACE_TP_Reactor::handle_socket_events
(this=0x7fff97fe4860, event_count=@0x418020ec, guard=@0x41802130) at
TP_Reactor.cpp:511
#5  0x00002ba512d061c2 in ACE_TP_Reactor::dispatch_i (this=0x7fff97fe4860,
max_wait_time=<value optimized out>, guard=@0x41802130) at TP_Reactor.cpp:314
#6  0x00002ba512d06296 in ACE_TP_Reactor::handle_events (this=0x7fff97fe4860,
max_wait_time=0x0) at TP_Reactor.cpp:174
#7  0x00002ba512ce84b5 in ACE_Reactor::run_reactor_event_loop
(this=0x7fff97fe5020, eh=0) at Reactor.cpp:233
#8  0x0000000000403cdc in threadFunc (arg=0x7fff97fe5020) at server.cpp:28
#9  0x00002ba512cfd5b7 in ACE_Thread_Adapter::invoke (this=0x5173c0) at
Thread_Adapter.cpp:95
#10 0x00002ba512f8c223 in start_thread () from /lib64/libpthread.so.0
#11 0x00002ba5136c973d in clone () from /lib64/libc.so.6
#12 0x0000000000000000 in ?? ()
(gdb) t 1
[Switching to thread 1 (Thread 47987996994736 (LWP 11177))]#0 
0x00002ba512f8f5e6 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
(gdb) back
#0  0x00002ba512f8f5e6 in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x00002ba512ca1e45 in ACE_Condition_Thread_Mutex::wait (this=0x516c8c,
mutex=@0x0, abstime=0x1) at OS_NS_Thread.inl:452
#2  0x00002ba512d01851 in ACE_Thread_Manager::wait (this=0x516c00, timeout=0x0,
abandon_detached_threads=0) at Thread_Manager.cpp:1791
#3  0x0000000000403de8 in main () at server.cpp:61
(gdb) t 3
[Switching to thread 3 (Thread 1140881760 (LWP 11187))]#0  0x00002ba512f8f5e6 in
pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
(gdb) back
#0  0x00002ba512f8f5e6 in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x00002ba512ca1e45 in ACE_Condition_Thread_Mutex::wait (this=0x44007024,
mutex=@0x0, abstime=0x1) at OS_NS_Thread.inl:452
#2  0x00002ba512d04da7 in ACE_Token::shared_acquire (this=0x7fff97fe4f80,
sleep_hook_func=0, arg=0x0, timeout=0x0, op_type=<value optimized out>)
    at Token.inl:129
#3  0x00002ba512d05712 in ACE_TP_Token_Guard::acquire_token (this=0x44007130,
max_wait_time=0x0) at TP_Reactor.cpp:82
#4  0x00002ba512d06230 in ACE_TP_Reactor::handle_events (this=0x7fff97fe4860,
max_wait_time=0x0) at TP_Reactor.cpp:160
#5  0x00002ba512ce84b5 in ACE_Reactor::run_reactor_event_loop
(this=0x7fff97fe5020, eh=0) at Reactor.cpp:233
#6  0x0000000000403cdc in threadFunc (arg=0x7fff97fe5020) at server.cpp:28
#7  0x00002ba512cfd5b7 in ACE_Thread_Adapter::invoke (this=0x517f50) at
Thread_Adapter.cpp:95
#8  0x00002ba512f8c223 in start_thread () from /lib64/libpthread.so.0
#9  0x00002ba5136c973d in clone () from /lib64/libc.so.6
#10 0x0000000000000000 in ?? ()
(gdb) t 4
[Switching to thread 4 (Thread 1132489056 (LWP 11186))]#0  0x00002ba512f8f5e6 in
pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
(gdb) back
#0  0x00002ba512f8f5e6 in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x00002ba512ca1e45 in ACE_Condition_Thread_Mutex::wait (this=0x43806024,
mutex=@0x0, abstime=0x1) at OS_NS_Thread.inl:452
#2  0x00002ba512d04da7 in ACE_Token::shared_acquire (this=0x7fff97fe4f80,
sleep_hook_func=0, arg=0x0, timeout=0x0, op_type=<value optimized out>)
    at Token.inl:129
#3  0x00002ba512d05712 in ACE_TP_Token_Guard::acquire_token (this=0x43806130,
max_wait_time=0x0) at TP_Reactor.cpp:82
#4  0x00002ba512d06230 in ACE_TP_Reactor::handle_events (this=0x7fff97fe4860,
max_wait_time=0x0) at TP_Reactor.cpp:160
#5  0x00002ba512ce84b5 in ACE_Reactor::run_reactor_event_loop
(this=0x7fff97fe5020, eh=0) at Reactor.cpp:233
#6  0x0000000000403cdc in threadFunc (arg=0x7fff97fe5020) at server.cpp:28
#7  0x00002ba512cfd5b7 in ACE_Thread_Adapter::invoke (this=0x517d00) at
Thread_Adapter.cpp:95
#8  0x00002ba512f8c223 in start_thread () from /lib64/libpthread.so.0
#9  0x00002ba5136c973d in clone () from /lib64/libc.so.6
#10 0x0000000000000000 in ?? ()
(gdb) t 5
[Switching to thread 5 (Thread 1124096352 (LWP 11185))]#0  0x00002ba5136c3652 in
__select_nocancel () from /lib64/libc.so.6
(gdb) back
#0  0x00002ba5136c3652 in __select_nocancel () from /lib64/libc.so.6
#1  0x00002ba512c95d51 in ACE_Select_Reactor_T<ACE_Reactor_Token_T<ACE_Token>
>::wait_for_multiple_events (this=dwarf2_read_address: Corrupted DWARF expression.
) at OS_NS_sys_select.inl:42
#2  0x00002ba512d0615e in ACE_TP_Reactor::dispatch_i (this=0x8,
max_wait_time=0x7fff97fe4da8, guard=@0x0) at TP_Reactor.cpp:256
#3  0x00002ba512d06296 in ACE_TP_Reactor::handle_events (this=0x7fff97fe4860,
max_wait_time=0x0) at TP_Reactor.cpp:174
#4  0x00002ba512ce84b5 in ACE_Reactor::run_reactor_event_loop
(this=0x7fff97fe5020, eh=0) at Reactor.cpp:233
#5  0x0000000000403cdc in threadFunc (arg=0x7fff97fe5020) at server.cpp:28
#6  0x00002ba512cfd5b7 in ACE_Thread_Adapter::invoke (this=0x517ab0) at
Thread_Adapter.cpp:95
#7  0x00002ba512f8c223 in start_thread () from /lib64/libpthread.so.0
#8  0x00002ba5136c973d in clone () from /lib64/libc.so.6
#9  0x0000000000000000 in ?? ()
(gdb) t 6
[Switching to thread 6 (Thread 1115703648 (LWP 11184))]#0  0x00002ba512f8f5e6 in
pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
(gdb) back
#0  0x00002ba512f8f5e6 in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x00002ba512ca1e45 in ACE_Condition_Thread_Mutex::wait (this=0x42804024,
mutex=@0x0, abstime=0x1) at OS_NS_Thread.inl:452
#2  0x00002ba512d04da7 in ACE_Token::shared_acquire (this=0x7fff97fe4f80,
sleep_hook_func=0, arg=0x0, timeout=0x0, op_type=<value optimized out>)
    at Token.inl:129
#3  0x00002ba512d05712 in ACE_TP_Token_Guard::acquire_token (this=0x42804130,
max_wait_time=0x0) at TP_Reactor.cpp:82
#4  0x00002ba512d06230 in ACE_TP_Reactor::handle_events (this=0x7fff97fe4860,
max_wait_time=0x0) at TP_Reactor.cpp:160
#5  0x00002ba512ce84b5 in ACE_Reactor::run_reactor_event_loop
(this=0x7fff97fe5020, eh=0) at Reactor.cpp:233
#6  0x0000000000403cdc in threadFunc (arg=0x7fff97fe5020) at server.cpp:28
#7  0x00002ba512cfd5b7 in ACE_Thread_Adapter::invoke (this=0x517860) at
Thread_Adapter.cpp:95
#8  0x00002ba512f8c223 in start_thread () from /lib64/libpthread.so.0
#9  0x00002ba5136c973d in clone () from /lib64/libc.so.6
#10 0x0000000000000000 in ?? ()
(gdb) t 7
[Switching to thread 7 (Thread 1107310944 (LWP 11183))]#0  0x00002ba512f91a49 in
__lll_mutex_unlock_wake () from /lib64/libpthread.so.0
(gdb) back
#0  0x00002ba512f91a49 in __lll_mutex_unlock_wake () from /lib64/libpthread.so.0
#1  0x0000000000000001 in ?? ()
#2  0x0000000000000001 in ?? ()
#3  0x00002ba512f8ebb9 in pthread_mutex_unlock () from /lib64/libpthread.so.0
#4  0x00002ba512d04c95 in ACE_Token::shared_acquire (this=0x7fff97fe4fa8,
sleep_hook_func=0, arg=0x0, timeout=0x0, op_type=2550026112)
    at OS_NS_Thread.inl:3713
#5  0x00002ba512c97bdb in ACE_Select_Reactor_T<ACE_Reactor_Token_T<ACE_Token>
>::resume_handler (this=0x7fff97fe4f80, handle=-1744943008) at Guard_T.inl:29
#6  0x00002ba512d060c8 in ACE_TP_Reactor::handle_socket_events
(this=0x7fff97fe4860, event_count=@0x0, guard=@0x42003130) at TP_Reactor.cpp:517
#7  0x00002ba512d061c2 in ACE_TP_Reactor::dispatch_i (this=0x7fff97fe4860,
max_wait_time=<value optimized out>, guard=@0x42003130) at TP_Reactor.cpp:314
#8  0x00002ba512d06296 in ACE_TP_Reactor::handle_events (this=0x7fff97fe4860,
max_wait_time=0x0) at TP_Reactor.cpp:174
#9  0x00002ba512ce84b5 in ACE_Reactor::run_reactor_event_loop
(this=0x7fff97fe5020, eh=0) at Reactor.cpp:233
#10 0x0000000000403cdc in threadFunc (arg=0x7fff97fe5020) at server.cpp:28
#11 0x00002ba512cfd5b7 in ACE_Thread_Adapter::invoke (this=0x517610) at
Thread_Adapter.cpp:95
#12 0x00002ba512f8c223 in start_thread () from /lib64/libpthread.so.0
#13 0x00002ba5136c973d in clone () from /lib64/libc.so.6
#14 0x0000000000000000 in ?? ()
(gdb) t 9
[Switching to thread 9 (Thread 1090525536 (LWP 11181))]#0  0x00002ba512f8f5e6 in
pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
(gdb) back
#0  0x00002ba512f8f5e6 in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x00002ba512ca1e45 in ACE_Condition_Thread_Mutex::wait (this=0x41001024,
mutex=@0x0, abstime=0x1) at OS_NS_Thread.inl:452
#2  0x00002ba512d04da7 in ACE_Token::shared_acquire (this=0x7fff97fe4f80,
sleep_hook_func=0, arg=0x0, timeout=0x0, op_type=<value optimized out>)
    at Token.inl:129
#3  0x00002ba512d05712 in ACE_TP_Token_Guard::acquire_token (this=0x41001130,
max_wait_time=0x0) at TP_Reactor.cpp:82
#4  0x00002ba512d06230 in ACE_TP_Reactor::handle_events (this=0x7fff97fe4860,
max_wait_time=0x0) at TP_Reactor.cpp:160
#5  0x00002ba512ce84b5 in ACE_Reactor::run_reactor_event_loop
(this=0x7fff97fe5020, eh=0) at Reactor.cpp:233
#6  0x0000000000403cdc in threadFunc (arg=0x7fff97fe5020) at server.cpp:28
#7  0x00002ba512cfd5b7 in ACE_Thread_Adapter::invoke (this=0x517170) at
Thread_Adapter.cpp:95
#8  0x00002ba512f8c223 in start_thread () from /lib64/libpthread.so.0
#9  0x00002ba5136c973d in clone () from /lib64/libc.so.6
#10 0x0000000000000000 in ?? ()
(gdb) t 10
[Switching to thread 10 (Thread 1082132832 (LWP 11180))]#0  0x00002ba5136c3652
in __select_nocancel () from /lib64/libc.so.6
(gdb) b
Breakpoint 2 at 0x2ba5136c3652
(gdb) back
#0  0x00002ba5136c3652 in __select_nocancel () from /lib64/libc.so.6
#1  0x00002ba512c85ee3 in ACE::handle_ready (handle=6, timeout=0x511d00,
read_ready=1, write_ready=0, exception_ready=0) at OS_NS_sys_select.inl:42
#2  0x00002ba512c87247 in ACE::recv_n_i (handle=6, buf=0x2ac6c0fcb010,
len=104857600, timeout=0x511d00, bt=<value optimized out>) at ACE.inl:248
#3  0x000000000040cd33 in ReadHandler::handle_input (this=0x517f50, h=6) at
ACE.inl:107
#4  0x00002ba512d050d8 in ACE_TP_Reactor::dispatch_socket_event
(this=0x7fff97fe4860, dispatch_info=<value optimized out>) at TP_Reactor.cpp:660
#5  0x00002ba512d0609f in ACE_TP_Reactor::handle_socket_events
(this=0x7fff97fe4860, event_count=@0x408000ec, guard=@0x40800130) at
TP_Reactor.cpp:511
#6  0x00002ba512d061c2 in ACE_TP_Reactor::dispatch_i (this=0x7fff97fe4860,
max_wait_time=<value optimized out>, guard=@0x40800130) at TP_Reactor.cpp:314
#7  0x00002ba512d06296 in ACE_TP_Reactor::handle_events (this=0x7fff97fe4860,
max_wait_time=0x0) at TP_Reactor.cpp:174
#8  0x00002ba512ce84b5 in ACE_Reactor::run_reactor_event_loop
(this=0x7fff97fe5020, eh=0) at Reactor.cpp:233
#9  0x0000000000403cdc in threadFunc (arg=0x7fff97fe5020) at server.cpp:28
#10 0x00002ba512cfd5b7 in ACE_Thread_Adapter::invoke (this=0x516f20) at
Thread_Adapter.cpp:95
#11 0x00002ba512f8c223 in start_thread () from /lib64/libpthread.so.0
#12 0x00002ba5136c973d in clone () from /lib64/libc.so.6
#13 0x0000000000000000 in ?? ()
(gdb) t 11
Thread ID 11 not known.
(
Comment 7 Johnny Willemsen 2006-03-15 09:05:43 CST
We have a client that connects to a server, sends first the number of bytes the
client want to send to the server, the server acks this, then the client sends
the bytes to the server (100Mb in the test program), the server acks the receive
and closes the connection. At the moment the client has again data for the
server it opens a new connection, sends the data and things close again. In the
test program we have 2 clients each just doing its send in a endless loop.

Let us now go to the point that the server is receiving the 100Mb of the client
on handle 8. At the moment we have received the data we send the ack back and
the handle_input returns -1. The returning of -1 means we want to close this
handle, so the reactor removes the handle 8 out of it fd_sets and calls
handle_close with just the read_mask and then the handler deletes itself, as a
result the ACE_SOCK_Stream is destructed and we close the socket connection. The
handler is now destructed and completely out of the fd_sets (also removed from
the suspended set). We now come back in ACE_TP_Reactor::handle_socket_events and
there we always try to resume the handle, this in 99.99999% of the cases doesn't
give side effects, we grab the reactor token, look in the suspend set if handle
8 is there, if it is, we resume it, if not we just do nothing. Because the
handle 8 is fully removed it is not in the suspend set and things just go well.

But now the race condition. After we have called handle_close on the event
handler we have to grab the reactor token to be able to resume it. In this time
it can be that there is a new connection established for a client which then can
use handle 8 again (which is freed in the past and can be reused by the OS).
Another thread takes the connection and calls handle_input for it. In this step
of doing this handle 8 is now suspended. When now the thread that has closed the
handle 8 previously has the reactor token, it tries to resume handle 8, and now
it is in the suspend_set (but really for a different connection) and then this
thread resumes it again, resulting in the fact that the handle is resumed but
another thread is calling handle_input at this moment and then things go wrong.
Comment 8 Johnny Willemsen 2006-03-15 09:05:55 CST
taking this over
Comment 9 Johnny Willemsen 2006-03-17 02:58:34 CST
Fri Mar 17 08:56:12 UTC 2006  Johnny Willemsen  <jwillemsen@remedy.nl>

        * ace/TP_Reactor.{h,cpp,inl}:
          Fixed bugzilla #2395. This was about a race condition in the
          TP_Reactor, when the handle_x method returns -1 the handler
          is removed and after that we try to resume. Normally this resume
          is a noop because for the handle there is no event handler anymore.
          But to resume the handle we need to reacquire the lock on the
          reactor, in the time between the remove and the reacquire we could
          have received a new connection for which the handle is used which
          we already freed and then we can resume this handle but then for
          a new eventhandler. The fix is to do the remove and resume as atomic
          operation. Thanks to Bala Natarajan and Steve Huston for reviewing
          the patches and Kees van Marle for debugging and analyzing this
          problem.
Comment 10 Johnny Willemsen 2006-03-21 04:51:57 CST
a small performance problem was fixed.

Mon Mar 20 10:15:12 UTC 2006  Johnny Willemsen  <jwillemsen@remedy.nl>

        * ace/TP_Reactor.cpp (post_process_socket_event):
          Before acquiring the token, first check if we really have to do some
          post processing, if not, then we just don't acquire the token. This
          fixes the performance drop in the TAO thread pool performance test.
          Thanks to Kobi Cohen-Arazi <kobi dot cohenarazi at gmail dot com>
          for noticing this drop in the performance stats.

        * ace/TP_Reactor.h:
          Small documentation improvement