Bug 1074 - TP_Reactor and Transport_Cache_Manager race to close a connection.
Summary: TP_Reactor and Transport_Cache_Manager race to close a connection.
Status: RESOLVED DUPLICATE of bug 1031
Alias: None
Product: TAO
Classification: Unclassified
Component: ORB (show other bugs)
Version: 1.2
Hardware: All All
: P3 normal
Assignee: DOC Center Support List (internal)
URL:
Depends on:
Blocks:
 
Reported: 2001-10-23 18:55 CDT by gclark
Modified: 2001-10-23 19:50 CDT (History)
0 users

See Also:


Attachments
Test case. (5.46 KB, application/octet-stream)
2001-10-23 18:56 CDT, gclark
Details
Same test case, this time with the correct mime type. (5.46 KB, application/x-zip-compressed)
2001-10-23 19:50 CDT, gclark
Details

Note You need to log in before you can comment on or make changes to this bug.
Description gclark 2001-10-23 18:55:19 CDT
In responding to a socket event indicating the closure of an outgoing 
connection, the TP_Reactor may attempt to access the IIOP connection 
handler during a time in which the connection handler and its associated
 Transport object are not locked or marked busy in any way.  A failure
can occur if the transport cache manager concurrently selects this 
connection for purging.


FAILURE SCENARIO:
-----------------

1.  Server "A" establishes a connection to some remote server "B".
2.  B decides to close the connection at a time when it is quiescent.
3.  In A, the closure of the connection causes the socket to become ready
    for reading.
4.  Hence the leader thread returns from select() and calls
      ACE_TP_Reactor::handle_socket_events().
5.  handle_socket_events() locates the IIOP_Connection_Handler associated
    with the ready file handle.
6.  handle_socket_events() releases the reactor token.
    (Note that at this point, the connection is marked 
           "ACE_RECYCLABLE_IDLE_AND_PURGABLE").
7.  Another thread makes a method invocation on a remote object, causing
    the transport cache manager to purge the connection found in step 5.
    This deletes the IIOP_Connection_Handler.
8.  Back in the leader thread, handle_socket_events() then tries to invoke
    a callback method on the deleted IIOP_Connection_Handler.

The attached program demonstrates the bug.

I expect the bug affects all platforms, but I have tested the program on 
Windows 2000 only.



POSSIBLE FIX:
-------------

I don't have a fix or workaround yet.  We are considering replacing the
pending_upcalls counter with a more general reference count in the
ACE_Event_Handler object.  The transport cache manager and the
reactor would "hold" references to the ACE_Event_Handler (which is
the IIOP_Connection_Handler) for appropriate periods to prevent the
race.
Comment 1 gclark 2001-10-23 18:56:03 CDT
Created attachment 92 [details]
Test case.
Comment 2 Nanbor Wang 2001-10-23 19:02:50 CDT
IMHO, it is a duplicate of bug 1020. We have a bugzilla id #1031 which talks 
about the improvements that we want to make to the TP_Reactor. Can we use your 
testcase for bug 1020?
Comment 3 Nanbor Wang 2001-10-23 19:10:55 CDT
I am sort of convinced that this is more or a duplicate of 1031 (and hence 1020)

*** This bug has been marked as a duplicate of 1031 ***
Comment 4 gclark 2001-10-23 19:50:24 CDT
Created attachment 93 [details]
Same test case, this time with the correct mime type.