Please report new issues athttps://github.com/DOCGroup
TAO VERSION: 1.1.17 ACE VERSION: 5.1.17 HOST MACHINE and OPERATING SYSTEM: Intel, Linux RedHat 6.2 TARGET MACHINE and OPERATING SYSTEM, if different from HOST: same COMPILER NAME AND VERSION (AND PATCHLEVEL): version egcs-2.91.66 19990314/Linux (egcs-1.1.2 release) AREA/CLASS/EXAMPLE AFFECTED: Multi-threaded applications using TAO with SSLIOP. DOES THE PROBLEM AFFECT: EXECUTION? Multi-threaded applications using TAO with SSLIOP. SYNOPSIS: Multi-threaded applcations using TAO with SSLIOP are not working properly. DESCRIPTION: I have a multi-threaded client and server using TAO and SSLIOP. The applications work fine without SSLIOP. As soon as I try to run the client and server with multiple threads, I run into problems. 1) If I run the client program with multiple threads, the client either prints "pure virtual method called" and blocks indefinitely or core dumps. 2) If I run multiple clients (with one thread) against the server, eventually it prints "pure virtual method called" and locks up. REPEAT BY: I can provide a sample client and server that recreate the error. The README in the tarball describes how to run the client and the server to recreate the errors. SAMPLE FIX/WORKAROUND: Haven't found one yet.
Created attachment 68 [details] This is a tarball file that includes a test case for the bug.
It sounds like we may not be providing enough synchronization in the SSLIOP pluggable protocol. OpenSSL is not entirely thread-safe so we may have to add our own synchronization when performing sends/recvs. From what I recall, SSL_read()/SSL_write() are not thread-safe, which may be the crux of the problem.
Made this bug a "blocker" bug since it is important that SSLIOP work in multithreaded applications.
Status update... I've fixed several race conditions in the code that sets up the TSS portion of the "SSLIOPCurrent" object. However, another problem occurs when performing a non-blocking SSLIOP accept() with the TP Reactor. Once that problem is fixed, I should be able to mark this bugzilla entry as fixed. Working on it... Descriptions of the changes made when fixing the "SSLIOPCurrent" related race conditions follow: Fri Sep 21 21:39:22 2001 Ossama Othman <ossama@uci.edu> * orbsvcs/orbsvcs/SSLIOP/SSLIOP_Util.h: * orbsvcs/orbsvcs/SSLIOP/SSLIOP_Util.cpp: New SSLIOP utility class containing code common to both the SSLIOP client side and the server side. * orbsvcs/orbsvcs/SSLIOP/IIOP_SSL_Acceptor.h: * orbsvcs/orbsvcs/SSLIOP/IIOP_SSL_Connector.h: * orbsvcs/orbsvcs/SSLIOP/SSLIOP_Acceptor.h: * orbsvcs/orbsvcs/SSLIOP/SSLIOP_Connector.h: Added new attributed that contains the state to be passed to each new TAO_SSLIOP_Connection_Handler. Doxygenated this header. * orbsvcs/orbsvcs/SSLIOP/IIOP_SSL_Acceptor.cpp (open_i): * orbsvcs/orbsvcs/SSLIOP/IIOP_SSL_Connector.cpp (open): * orbsvcs/orbsvcs/SSLIOP/SSLIOP_Acceptor.cpp (ssliop_open_i): * orbsvcs/orbsvcs/SSLIOP/SSLIOP_Connector.cpp (open): Invoke the new TAO_SSLIOP_Util::setup_handler_state() method, and pass the created state to the creation strategy. * orbsvcs/orbsvcs/SSLIOP/IIOP_SSL_Transport.h: Cache a reference to the SSLIOP::Current object to avoid any reference resolution overhead in the critical path. * orbsvcs/orbsvcs/SSLIOP/IIOP_SSL_Transport.cpp (handle_input_i): Pass cached SSLIOP::Current reference to the TAO_Null_SSL_State_Guard. Saves an ORB::resolve_initial_references() in th critical path. * orbsvcs/orbsvcs/SSLIOP/SSLIOP_Current.h: Generated and added "TAO_SSLIOP_Current" downcast and reference count related methods. For example, it is now possible to _narrow() a SSLIOP::Current object reference to TAO_SSLIOP_Current object reference. This makes it possible to gain access to the methods in TAO_SSLIOP_Current not defined in the SSLIOP::Current IDL without relying on dynamic_cast, which may not work properly when using compilers that do not support RTTI. (orb_id_): No longer any need for this attribute. (See below.) * orbsvcs/orbsvcs/SSLIOP/SSLIOP_Current.cpp (TAO_SSLIOP_Current): This constructor now accepts a pointer to the ORB Core. This obviates the need to cache the ORBid. (init): Removed this method. It is no longer needed since a pointer to the ORB Core is now cached in the TAO_SSLIOP_Current instance at ORB bootstrap time. (setup, teardown): These methods now accept parameters that contain the state that will be used when setting up the TSS portion of the SSLIOP::Current object. This makes these mades thread-safe and reentrant without introducing any locks. [Bug 943] * orbsvcs/orbsvcs/SSLIOP/SSLIOP_Current.inl (implementation): Don't call init(). That method has been removed. * orbsvcs/orbsvcs/SSLIOP/SSLIOP_ORBInitializer.cpp (pre_init): Retrieve a pointer to the ORB Core from the TAO_ORBInitInfo object and pass it to the TAO_SSLIOP_Current constructor when creating the TAO_SSLIOP_Current object. * orbsvcs/orbsvcs/SSLIOP/SSLIOP_Connection_Handler.h: * orbsvcs/orbsvcs/SSLIOP/SSLIOP_Connection_Handler.cpp (TAO_SSLIOP_Connection_Handler_State): New class that contains state that must be passed to a TAO_SSLIOP_Connection_Handler up on creation. (TAO_SSL_State_Guard): No need to export this class. It is only used internally by the TAO_SSLIOP library. This class now stores additional state. Specifically, it contains a pointer to the "previous" SSLIOP::Current implementation that was stored in TSS, the current SSLIOP::Current implementation, and a flag that specifies whether or not the TSS portion of SSLIOP::Current was setup. This state is passed to the connection handler each time a request is handled. Since this class is instantiated each time a request is handled, the encapsulated state is request-specific, meaning that SSL state setup in TSS is reentrant and thread-safe. [Bug 943] No longer any need to cache a pointer to the ORB Core. Adjusted the constructor signature and removed the attribute accordingly. (TAO_SSLIOP_Connection_Handler): This connection handler no longer retains an instance of the TAO_SSLIOP_Current_Impl object. That object was subject to a race condition. [Bug 943] (current): New methods to get and set the cached TAO_SSLIOP_Current reference. There is no longer any need to perform a dynamic_cast on the SSLIOP::Current object reference to gain access to TAO_SSLIOP_Current-specific methods. (setup_ssl_state): It is no longer necessary to retrieve the SSLIOP::Current object reference during the first attempt to set up the TSS portion of the SSLIOP::Current object. That object reference is now retrieved and cached at ORB bootstrap time. (See the description of the changes to SSLIOP_ORBInitializer.cpp above for details.) (setup_ssl_state, teardown_ssl_state): These methods now accept parameters that contain the state that will be used when setting up the TSS portion of the SSLIOP::Current object. This makes these mades thread-safe and reentrant without introducing any locks. [Bug 943] * orbsvcs/orbsvcs/SSLIOP/SSLIOP_Connection_Handler.i: Updated the calls to the setup/teardown_ssl_state() in the TAO_SSL_State_Guard constructor and destructor to match the new method signatures. * orbsvcs/orbsvcs/SSLIOP/SSLIOP_Transport.cpp (handle_input_i): The signature of the TAO_SSL_State_Guard constructor changed. Adjusted the instantiation accordingly. * orbsvcs/orbsvcs/SSLIOP/IIOP_SSL_Connection_Handler.h: * orbsvcs/orbsvcs/SSLIOP/IIOP_SSL_Connection_Handler.cpp: Changes analogous to those made in the SSLIOP_Connection_Handler counterparts. See above for details. * orbsvcs/orbsvcs/SSLIOP/IIOP_SSL_Connection_Handler.inl: Inlined the TAO_Null_SSL_State_Guard constructor and destructor for speed/efficiency reasons. * orbsvcs/orbsvcs/SSLIOP/SSLIOP_Current_Impl.h: Corrected destructor comment. * orbsvcs/orbsvcs/SSLIOP/SSLIOP_Current_Impl.inl: * orbsvcs/orbsvcs/SSLIOP/SSLIOP_Current_Impl.cpp: Inlined the TAO_SSLIOP_Current_Impl constructor for speed/efficiency reasons, i.e. since it is instantiated in the critical path. * orbsvcs/orbsvcs/Makefile.SSLIOP (FILES): * orbsvcs/orbsvcs/SSLIOP.bor (OBJFILES): Added new SSLIOP_Util sources to the list of files to compile and link.
Note that the above changes to the "SSLIOPCurrent" related code do *not* introduce any locks or additional TSS accesses. In fact, the changes should make TAO'S SSLIOP implementation more efficient due to less overhead.
Downgrading severity to critical until after TAO 1.2 is released. Many improvements/fixes have been made, but there is apparently one lingering race condition that I will address after the release. TAO 1.2 should not be held up due to a problem that is not part of the ORB core or ACE.
Some more improvements/fixes. With the below fixes and updates in place, the number of TAO/orbsvcs/tests/Security/MT_SSLIOP test failures have decreased. The test actually passes from time to time. Apparently there are still some race conditions that must be addressed. In ACE_SSL: Sat Oct 06 23:06:41 2001 Ossama Othman <ossama@uci.edu> * ace/SSL/SSL_Accept_Handler.cpp (ssl_accept): * ace/SSL/SSL_Connect_Handler.cpp (ssl_connect): * ace/SSL/SSL_SOCK_Stream.i (send_i, recv_i, close): Removed the do/while(SSL_pending()) loop. Its introduction into the code was a bit misguided. There is no need to flush the SSL buffer before returning to the Reactor's event loop since the code now sends a notification to the Reactor to force the Reactor to invoke the event handler before waiting in the event loop. This fixes some SSL protocol errors that manifested during multithreaded SSL communication. [Bug 943] * ace/SSL/SSL_SOCK_Acceptor.cpp (ssl_accept): It is no longer necessary to call SSL_accept()/connect() to flush the SSL buffer before running the Reactor's event loop since the Reactor is notified that it should call the appropriate event handler before blocking on select() for WaitForMultipleObjects(), for example. * ace/SSL/SSL_SOCK_Stream.cpp (reactor, handler): New methods to set the Reactor and event handler associated with the ACE_SSL_SOCK_Stream. If each of these is set, then the Reactor will be notified if data is still pending in the SSL buffer. (~ACE_SSL_SOCK_Stream): Uninlined the destructor. There is no need for it to be inlined, and doing so only caused unnecessary code bloat. * ace/SSL/SSL_SOCK_Stream.h (notify, reactor, handler): Added these new method declarations, in addition to the corresponding attributes. Updated documentation. * ace/SSL/SSL_SOCK_Stream.i (notify): Method that pushes the event handler on to the Reactor's notification pipe when necessary. Invoked by recv_i() and send_i(). In TAO_SSLIOP: Sat Oct 06 23:41:11 2001 Ossama Othman <ossama@uci.edu> * orbsvcs/orbsvcs/SSLIOP/SSLIOP_Transport.cpp (register_handler_i): Set the reactor and handler in the ACE_SSL_SOCK_Stream so that the reactor is notified if data is still pending for read or write. This is necessary since SSL is record-oriented, and the underlying SSL implementation may buffer data. [Bug 943]
I found another problem. When configured to use the thread-per-connection concurrency model, the MT_SSLIOP test trips the assertion in TAO_SSLIOP_Connection_Handler::handle_close(), i.e.: int TAO_SSLIOP_Connection_Handler::handle_close (ACE_HANDLE handle, ACE_Reactor_Mask rm) { if (TAO_debug_level) ACE_DEBUG ((LM_DEBUG, "TAO (%P|%t) SSLIOP_Connection_Handler::handle_close " "(%d, %d)\n", handle, rm)); long upcalls = this->decr_pending_upcalls (); ACE_ASSERT (upcalls >= 0); // Try to clean up things if the upcall count has reached 0 if (upcalls == 0) this->handle_close_i (); return 0; }
It appears likely that the problem exhibited by the MT_SSLIOP test where the reactor attempts to invoke an event handler that has already been deallocated is either caused by or related to the same bug(s) described in bug 1020 and/or 1031. As such, the SSLIOP pluggable protocol may not be at fault. However, this still needs to be confirmed.
Some recent changes to the ACE_SSL and TAO_SSLIOP code seem to have alleviated the problem where the reactor calls an event handler that has been deallocated. However, there are still some illusive race conditions somewhere in the I/O code.
Thanks to Steve's recent changes to the ACE_SSL wrappers, the last of the race conditions appear to have been eradicated. Go Steve!