Bug 943 - Multi-threaded applications using TAO with SSLIOP
Summary: Multi-threaded applications using TAO with SSLIOP
Status: RESOLVED FIXED
Alias: None
Product: TAO
Classification: Unclassified
Component: SSLIOP Pluggable Protocol (show other bugs)
Version: 1.1.17
Hardware: x86 Linux
: P2 critical
Assignee: Ossama Othman
URL:
Depends on: 1031 1020
Blocks:
  Show dependency tree
 
Reported: 2001-06-14 12:38 CDT by mtalley77
Modified: 2002-01-07 14:18 CST (History)
0 users

See Also:


Attachments
This is a tarball file that includes a test case for the bug. (399.33 KB, application/octet-stream)
2001-06-14 12:48 CDT, mtalley77
Details

Note You need to log in before you can comment on or make changes to this bug.
Description mtalley77 2001-06-14 12:38:44 CDT
TAO VERSION: 1.1.17
   ACE VERSION: 5.1.17

    HOST MACHINE and OPERATING SYSTEM: Intel, Linux RedHat 6.2
    TARGET MACHINE and OPERATING SYSTEM, if different from HOST: same
    COMPILER NAME AND VERSION (AND PATCHLEVEL): 
        version egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)

    AREA/CLASS/EXAMPLE AFFECTED:
        Multi-threaded applications using TAO with SSLIOP.

    DOES THE PROBLEM AFFECT:
        EXECUTION?
          Multi-threaded applications using TAO with SSLIOP.

    SYNOPSIS:
        Multi-threaded applcations using TAO with SSLIOP are not working 
        properly.  

    DESCRIPTION:
         I have a multi-threaded client and server using TAO and SSLIOP.  
         The applications work fine without SSLIOP. As soon as I try to run 
         the client and server with multiple threads, I run into problems.
         
         1) If I run the client program with multiple threads, the client 
            either prints "pure virtual method called" and blocks 
            indefinitely or core dumps.
            
         2) If I run multiple clients (with one thread) against the server,
            eventually it prints "pure virtual method called" and locks up.     

    REPEAT BY:
        I can provide a sample client and server that recreate the error.  
        The README in the tarball describes how to run the client and the 
        server to recreate the errors.

    SAMPLE FIX/WORKAROUND:
        Haven't found one yet.
Comment 1 mtalley77 2001-06-14 12:48:45 CDT
Created attachment 68 [details]
This is a tarball file that includes a test case for the bug.
Comment 2 Ossama Othman 2001-06-14 12:57:47 CDT
It sounds like we may not be providing enough synchronization in the SSLIOP 
pluggable protocol.  OpenSSL is not entirely thread-safe so we may have to add 
our own synchronization when performing sends/recvs.  From what I recall, 
SSL_read()/SSL_write() are not thread-safe, which may be the crux of the 
problem.

Comment 3 Ossama Othman 2001-06-14 12:58:56 CDT
Made this bug a "blocker" bug since it is important that SSLIOP work in 
multithreaded applications.
Comment 4 Ossama Othman 2001-09-22 00:08:50 CDT
Status update...

I've fixed several race conditions in the code that sets up the TSS portion of 
the "SSLIOPCurrent" object.  However, another problem occurs when performing a 
non-blocking SSLIOP accept() with the TP Reactor.  Once that problem is fixed, 
I should be able to mark this bugzilla entry as fixed.  Working on it...

Descriptions of the changes made when fixing the "SSLIOPCurrent" related race 
conditions follow:

Fri Sep 21 21:39:22 2001  Ossama Othman  <ossama@uci.edu>

	* orbsvcs/orbsvcs/SSLIOP/SSLIOP_Util.h:
	* orbsvcs/orbsvcs/SSLIOP/SSLIOP_Util.cpp:

	  New SSLIOP utility class containing code common to both the
	  SSLIOP client side and the server side.

	* orbsvcs/orbsvcs/SSLIOP/IIOP_SSL_Acceptor.h:
	* orbsvcs/orbsvcs/SSLIOP/IIOP_SSL_Connector.h:
	* orbsvcs/orbsvcs/SSLIOP/SSLIOP_Acceptor.h:
	* orbsvcs/orbsvcs/SSLIOP/SSLIOP_Connector.h:

	  Added new attributed that contains the state to be passed to
	  each new TAO_SSLIOP_Connection_Handler.

	  Doxygenated this header.

	* orbsvcs/orbsvcs/SSLIOP/IIOP_SSL_Acceptor.cpp (open_i):
	* orbsvcs/orbsvcs/SSLIOP/IIOP_SSL_Connector.cpp (open):
	* orbsvcs/orbsvcs/SSLIOP/SSLIOP_Acceptor.cpp (ssliop_open_i):
	* orbsvcs/orbsvcs/SSLIOP/SSLIOP_Connector.cpp (open):

	  Invoke the new TAO_SSLIOP_Util::setup_handler_state() method,
	  and pass the created state to the creation strategy.

	* orbsvcs/orbsvcs/SSLIOP/IIOP_SSL_Transport.h:

	  Cache a reference to the SSLIOP::Current object to avoid any
	  reference resolution overhead in the critical path.

	* orbsvcs/orbsvcs/SSLIOP/IIOP_SSL_Transport.cpp (handle_input_i):

	  Pass cached SSLIOP::Current reference to the
	  TAO_Null_SSL_State_Guard.  Saves an
	  ORB::resolve_initial_references() in th critical path.

	* orbsvcs/orbsvcs/SSLIOP/SSLIOP_Current.h:

	  Generated and added "TAO_SSLIOP_Current" downcast and reference
	  count related methods.  For example, it is now possible to
	  _narrow() a SSLIOP::Current object reference to
	  TAO_SSLIOP_Current object reference.  This makes it possible to
	  gain access to the methods in TAO_SSLIOP_Current not defined in
	  the SSLIOP::Current IDL without relying on dynamic_cast, which
	  may not work properly when using compilers that do not support
	  RTTI.

	  (orb_id_):

	  No longer any need for this attribute.  (See below.)

	* orbsvcs/orbsvcs/SSLIOP/SSLIOP_Current.cpp (TAO_SSLIOP_Current):

	  This constructor now accepts a pointer to the ORB Core.  This
	  obviates the need to cache the ORBid.

	  (init):

	  Removed this method.  It is no longer needed since a pointer to
	  the ORB Core is now cached in the TAO_SSLIOP_Current instance at
	  ORB bootstrap time.

	  (setup, teardown):

	  These methods now accept parameters that contain the state that
	  will be used when setting up the TSS portion of the
	  SSLIOP::Current object.  This makes these mades thread-safe and
	  reentrant without introducing any locks.  [Bug 943]

	* orbsvcs/orbsvcs/SSLIOP/SSLIOP_Current.inl (implementation):

	  Don't call init().  That method has been removed.

	* orbsvcs/orbsvcs/SSLIOP/SSLIOP_ORBInitializer.cpp (pre_init):

	  Retrieve a pointer to the ORB Core from the TAO_ORBInitInfo
	  object and pass it to the TAO_SSLIOP_Current constructor when
	  creating the TAO_SSLIOP_Current object.

	* orbsvcs/orbsvcs/SSLIOP/SSLIOP_Connection_Handler.h:
	* orbsvcs/orbsvcs/SSLIOP/SSLIOP_Connection_Handler.cpp
	  (TAO_SSLIOP_Connection_Handler_State):

	  New class that contains state that must be passed to a
	  TAO_SSLIOP_Connection_Handler up on creation.

	  (TAO_SSL_State_Guard):

	  No need to export this class.  It is only used internally by the
	  TAO_SSLIOP library.

	  This class now stores additional state.  Specifically, it
	  contains a pointer to the "previous" SSLIOP::Current
	  implementation that was stored in TSS, the current
	  SSLIOP::Current implementation, and a flag that specifies
	  whether or not the TSS portion of SSLIOP::Current was setup.
	  This state is passed to the connection handler each time a
	  request is handled.  Since this class is instantiated each time
	  a request is handled, the encapsulated state is
	  request-specific, meaning that SSL state setup in TSS is
	  reentrant and thread-safe.  [Bug 943]

	  No longer any need to cache a pointer to the ORB Core.  Adjusted
	  the constructor signature and removed the attribute
	  accordingly.

	  (TAO_SSLIOP_Connection_Handler):

	  This connection handler no longer retains an instance of the
	  TAO_SSLIOP_Current_Impl object.  That object was subject to a
	  race condition.  [Bug 943]

	  (current):

	  New methods to get and set the cached TAO_SSLIOP_Current
	  reference.  There is no longer any need to perform a
	  dynamic_cast on the SSLIOP::Current object reference to gain
	  access to TAO_SSLIOP_Current-specific methods.

	  (setup_ssl_state):

	  It is no longer necessary to retrieve the SSLIOP::Current object
	  reference during the first attempt to set up the TSS portion of
	  the SSLIOP::Current object.  That object reference is now
	  retrieved and cached at ORB bootstrap time.  (See the
	  description of the changes to SSLIOP_ORBInitializer.cpp above
	  for details.)

	  (setup_ssl_state, teardown_ssl_state):

	  These methods now accept parameters that contain the state that
	  will be used when setting up the TSS portion of the
	  SSLIOP::Current object.  This makes these mades thread-safe and
	  reentrant without introducing any locks.  [Bug 943]

	* orbsvcs/orbsvcs/SSLIOP/SSLIOP_Connection_Handler.i:

	  Updated the calls to the setup/teardown_ssl_state() in the
	  TAO_SSL_State_Guard constructor and destructor to match the new
	  method signatures.

	* orbsvcs/orbsvcs/SSLIOP/SSLIOP_Transport.cpp (handle_input_i):

	  The signature of the TAO_SSL_State_Guard constructor changed.
	  Adjusted the instantiation accordingly.

	* orbsvcs/orbsvcs/SSLIOP/IIOP_SSL_Connection_Handler.h:
	* orbsvcs/orbsvcs/SSLIOP/IIOP_SSL_Connection_Handler.cpp:

	  Changes analogous to those made in the SSLIOP_Connection_Handler
	  counterparts.  See above for details.

	* orbsvcs/orbsvcs/SSLIOP/IIOP_SSL_Connection_Handler.inl:

	  Inlined the TAO_Null_SSL_State_Guard constructor and destructor
	  for speed/efficiency reasons.

	* orbsvcs/orbsvcs/SSLIOP/SSLIOP_Current_Impl.h:

	  Corrected destructor comment.

	* orbsvcs/orbsvcs/SSLIOP/SSLIOP_Current_Impl.inl:
	* orbsvcs/orbsvcs/SSLIOP/SSLIOP_Current_Impl.cpp:

	  Inlined the TAO_SSLIOP_Current_Impl constructor for
	  speed/efficiency reasons, i.e. since it is instantiated in the
	  critical path.

	* orbsvcs/orbsvcs/Makefile.SSLIOP (FILES):
	* orbsvcs/orbsvcs/SSLIOP.bor (OBJFILES):

	  Added new SSLIOP_Util sources to the list of files to	compile
	  and link.

Comment 5 Ossama Othman 2001-09-22 00:11:29 CDT
Note that the above changes to the "SSLIOPCurrent" related code do *not* 
introduce any locks or additional TSS accesses.  In fact, the changes should 
make TAO'S SSLIOP implementation more efficient due to less overhead.
Comment 6 Ossama Othman 2001-10-01 18:14:58 CDT
Downgrading severity to critical until after TAO 1.2 is released.

Many improvements/fixes have been made, but there is apparently one lingering
race condition that I will address after the release.  TAO 1.2 should not be
held up due to a problem that is not part of the ORB core or ACE.
Comment 7 Ossama Othman 2001-10-07 03:01:39 CDT
Some more improvements/fixes.  With the below fixes and updates in place, the 
number of TAO/orbsvcs/tests/Security/MT_SSLIOP test failures have decreased.  
The test actually passes from time to time.  Apparently there are still some 
race conditions that must be addressed.

In ACE_SSL:

Sat Oct 06 23:06:41 2001  Ossama Othman  <ossama@uci.edu>

	* ace/SSL/SSL_Accept_Handler.cpp (ssl_accept):
	* ace/SSL/SSL_Connect_Handler.cpp (ssl_connect):
	* ace/SSL/SSL_SOCK_Stream.i (send_i, recv_i, close):

	  Removed the do/while(SSL_pending()) loop.  Its introduction into
	  the code was a bit misguided.  There is no need to flush the SSL
	  buffer before returning to the Reactor's event loop since the
	  code now sends a notification to the Reactor to force the
	  Reactor to invoke the event handler before waiting in the event
	  loop.  This fixes some SSL protocol errors that manifested
	  during multithreaded SSL communication.  [Bug 943]

	* ace/SSL/SSL_SOCK_Acceptor.cpp (ssl_accept):

	  It is no longer necessary to call SSL_accept()/connect() to
	  flush the SSL buffer before running the Reactor's event loop
	  since the Reactor is notified that it should call the
	  appropriate event handler before blocking on select() for
	  WaitForMultipleObjects(), for example.

	* ace/SSL/SSL_SOCK_Stream.cpp (reactor, handler):

	  New methods to set the Reactor and event handler associated with
	  the ACE_SSL_SOCK_Stream.  If each of these is set, then the
	  Reactor will be notified if data is still pending in the SSL
	  buffer.

	  (~ACE_SSL_SOCK_Stream):

	  Uninlined the destructor.  There is no need for it to be
	  inlined, and doing so only caused unnecessary code bloat.

	* ace/SSL/SSL_SOCK_Stream.h (notify, reactor, handler):

	  Added these new method declarations, in addition to the
	  corresponding attributes.

	  Updated documentation.

	* ace/SSL/SSL_SOCK_Stream.i (notify):

	  Method that pushes the event handler on to the Reactor's
	  notification pipe when necessary.  Invoked by recv_i() and
	  send_i().

In TAO_SSLIOP:

Sat Oct 06 23:41:11 2001  Ossama Othman  <ossama@uci.edu>

	* orbsvcs/orbsvcs/SSLIOP/SSLIOP_Transport.cpp
	  (register_handler_i):

	  Set the reactor and handler in the ACE_SSL_SOCK_Stream so that
	  the reactor is notified if data is still pending for read or
	  write.  This is necessary since SSL is record-oriented, and the
	  underlying SSL implementation may buffer data.  [Bug 943]

Comment 8 Ossama Othman 2001-10-11 13:33:30 CDT
I found another problem.  When configured to use the thread-per-connection
concurrency model, the MT_SSLIOP test trips the assertion in
TAO_SSLIOP_Connection_Handler::handle_close(), i.e.:

int
TAO_SSLIOP_Connection_Handler::handle_close (ACE_HANDLE handle,
                                             ACE_Reactor_Mask rm)
{
  if (TAO_debug_level)
    ACE_DEBUG  ((LM_DEBUG,
                 "TAO (%P|%t) SSLIOP_Connection_Handler::handle_close "
                 "(%d, %d)\n",
                 handle,
                 rm));

  long upcalls = this->decr_pending_upcalls ();

  ACE_ASSERT (upcalls >= 0);

  // Try to clean up things if the upcall count has reached 0
  if (upcalls == 0)
    this->handle_close_i ();


  return 0;
}
Comment 9 Ossama Othman 2001-10-16 00:18:59 CDT
It appears likely that the problem exhibited by the MT_SSLIOP test where the 
reactor attempts to invoke an event handler that has already been deallocated 
is either caused by or related to the same bug(s) described in bug 1020 and/or 
1031.

As such, the SSLIOP pluggable protocol may not be at fault.  However, this 
still needs to be confirmed.

Comment 10 Ossama Othman 2001-12-17 11:32:48 CST
Some recent changes to the ACE_SSL and TAO_SSLIOP code seem to have alleviated 
the problem where the reactor calls an event handler that has been 
deallocated.  However, there are still some illusive race conditions somewhere 
in the I/O code.
Comment 11 Ossama Othman 2002-01-07 14:18:18 CST
Thanks to Steve's recent changes to the ACE_SSL wrappers, the last of the race
conditions appear to have been eradicated.  Go Steve!