Bug 1074

Summary: TP_Reactor and Transport_Cache_Manager race to close a connection.
Product: TAO Reporter: gclark
Component: ORBAssignee: DOC Center Support List (internal) <tao-support>
Status: RESOLVED DUPLICATE    
Severity: normal    
Priority: P3    
Version: 1.2   
Hardware: All   
OS: All   
Attachments: Test case.
Same test case, this time with the correct mime type.

Description gclark 2001-10-23 18:55:19 CDT
In responding to a socket event indicating the closure of an outgoing 
connection, the TP_Reactor may attempt to access the IIOP connection 
handler during a time in which the connection handler and its associated
 Transport object are not locked or marked busy in any way.  A failure
can occur if the transport cache manager concurrently selects this 
connection for purging.


FAILURE SCENARIO:
-----------------

1.  Server "A" establishes a connection to some remote server "B".
2.  B decides to close the connection at a time when it is quiescent.
3.  In A, the closure of the connection causes the socket to become ready
    for reading.
4.  Hence the leader thread returns from select() and calls
      ACE_TP_Reactor::handle_socket_events().
5.  handle_socket_events() locates the IIOP_Connection_Handler associated
    with the ready file handle.
6.  handle_socket_events() releases the reactor token.
    (Note that at this point, the connection is marked 
           "ACE_RECYCLABLE_IDLE_AND_PURGABLE").
7.  Another thread makes a method invocation on a remote object, causing
    the transport cache manager to purge the connection found in step 5.
    This deletes the IIOP_Connection_Handler.
8.  Back in the leader thread, handle_socket_events() then tries to invoke
    a callback method on the deleted IIOP_Connection_Handler.

The attached program demonstrates the bug.

I expect the bug affects all platforms, but I have tested the program on 
Windows 2000 only.



POSSIBLE FIX:
-------------

I don't have a fix or workaround yet.  We are considering replacing the
pending_upcalls counter with a more general reference count in the
ACE_Event_Handler object.  The transport cache manager and the
reactor would "hold" references to the ACE_Event_Handler (which is
the IIOP_Connection_Handler) for appropriate periods to prevent the
race.
Comment 1 gclark 2001-10-23 18:56:03 CDT
Created attachment 92 [details]
Test case.
Comment 2 Nanbor Wang 2001-10-23 19:02:50 CDT
IMHO, it is a duplicate of bug 1020. We have a bugzilla id #1031 which talks 
about the improvements that we want to make to the TP_Reactor. Can we use your 
testcase for bug 1020?
Comment 3 Nanbor Wang 2001-10-23 19:10:55 CDT
I am sort of convinced that this is more or a duplicate of 1031 (and hence 1020)

*** This bug has been marked as a duplicate of 1031 ***
Comment 4 gclark 2001-10-23 19:50:24 CDT
Created attachment 93 [details]
Same test case, this time with the correct mime type.