Bug 1299 - infinite loop in client after server restart over SSLIOP
Summary: infinite loop in client after server restart over SSLIOP
Status: RESOLVED FIXED
Alias: None
Product: TAO
Classification: Unclassified
Component: SSLIOP Pluggable Protocol (show other bugs)
Version: 1.2.4
Hardware: x86 Windows 2000
: P5 normal
Assignee: Ossama Othman
URL:
: 1341 (view as bug list)
Depends on:
Blocks: 1277
  Show dependency tree
 
Reported: 2002-08-31 15:39 CDT by Christophe Vedel
Modified: 2002-11-30 22:42 CST (History)
3 users (show)

See Also:


Attachments
server configuration file (461 bytes, text/plain)
2002-09-02 01:50 CDT, Christophe Vedel
Details
client configuration file (323 bytes, text/plain)
2002-09-02 01:51 CDT, Christophe Vedel
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Christophe Vedel 2002-08-31 15:39:08 CDT
A client makes synchronous invocations to a server over SSLIOP. The client 
itself is not running a TAO event loop. If the server is restarted between two 
requests, the client enters an infinite loop inside _TAO_xxx_Remote_Proxy_Impl 
when issuing the next request.
_tao_call.invoke is called in a loop with the following sequence:
 
TAO_GIOP_Twoway_Invocation::invoke
 TAO_GIOP_Synch_Invocation::invoke_i
  TAO_GIOP_Invocation::invoke
   TAO_SSLIOP_Transport::send_request
    TAO_SSLIOP_Transport::send_message
     TAO_Transport::send_message_shared
      TAO_Transport::check_event_handler_i
      -1 <-
     -1 <-
    -1 <-
   -1 <-
   TAO_Transport::close_connection
   TAO_SSLIOP_Endpoint::reset_hint
   TAO_INVOKE_RESTART <-
  TAO_INVOKE_RESTART <-
 TAO_INVOKE_RESTART <-
TAO_INVOKE_RESTART <-

Finally PortableInterceptor::TRANSPORT_RETRY is returned and the client keeps 
retrying but the connection is never reestablished.

The problem does not occur in version 5.2.3 where check_event_handler_i does 
not return an error and _tao_call.invoke gives the following sequence:

TAO_GIOP_Twoway_Invocation::invoke
 TAO_GIOP_Synch_Invocation::invoke_i
  TAO_GIOP_Invocation::invoke -> 
  TAO_INVOKE_OK <-
  TAO_Wait_On_Reactor::wait
  -1 <-
  TAO_ORB_Core::service_raise_comm_failure

The client catches a COMM_FAILURE and if the call is retried at the application 
level, a new connection is established and the call succeeds.

The svc.conf file for the server (Linux) contains:

dynamic SSLIOP_Factory Service_Object * 
TAO_SSLIOP:_make_TAO_SSLIOP_Protocol_Factory() "-SSLAuthenticate SERVER -
SSLCertificate PEM:/usr/lib/ssl/certs/ltccert.pem -SSLPrivateKey 
PEM:/usr/lib/ssl/private/ltckey.pem" 
dynamic Advanced_Resource_Factory Service_Object * 
TAO_Strategies:_make_TAO_Advanced_Resource_Factory () "-ORBProtocolFactory 
SSLIOP_Factory -ORBReactorType select_st -ORBInputCDRAllocator null"
TAO_Strategies:_make_TAO_Advanced_Resource_Factory () "-ORBReactorType 
select_st -ORBInputCDRAllocator null"
static Server_Strategy_Factory "-ORBPOALock null "

The svc.conf file for the client (Windows 2000) contains:

dynamic  SSLIOP_Factory Service_Object * 
TAO_SSLIOP:_make_TAO_SSLIOP_Protocol_Factory() "-SSLAuthenticate SERVER"
dynamic Advanced_Resource_Factory Service_Object * 
TAO_Strategies:_make_TAO_Advanced_Resource_Factory () "-ORBProtocolFactory 
SSLIOP_Factory"
static Client_Strategy_Factory "-ORBClientConnectionHandler ST"
Comment 1 Nanbor Wang 2002-08-31 16:10:41 CDT
Hi

Could you provide a simple test that demonstrates the problem? 

Thanks
Comment 2 Christophe Vedel 2002-09-02 01:50:50 CDT
Created attachment 137 [details]
server configuration file
Comment 3 Christophe Vedel 2002-09-02 01:51:13 CDT
Created attachment 138 [details]
client configuration file
Comment 4 Christophe Vedel 2002-09-02 01:54:39 CDT
Using the svc.conf files provided for the client and server, the bug can be 
reproduced using the Simple/echo example. The process is the following:
Launch the server: server -o ior.txt -ORBSvcConf server.conf
Launch the client: client -f ior.txt -ORBSvcConf client.conf
At the prompt, enter one string and get the echo, then restart the server after 
the client has displayed a new prompt. After the server has restarted, enter a 
string in the client. The client enters an infinite loop in 
_TAO_Echo_Remote_Proxy_Impl::echo_string.
I attached the svc.conf files I used. I guess certificates from the security 
sample can be used. Thanks.
Comment 5 Christophe Vedel 2002-09-03 10:49:23 CDT
Some more information. The difference in behavior between 5.2.3 and 5.2.4 comes 
from the fact that in the new version, the transport remains in the cache 
instead of being purged. There is an infinite loop because instead of trying to 
create a new transport, the transport associated with the closed connection is 
returned from the cache. But this transport does not work so we keep retrying.
I think the problem comes from the fact that the call to close_connection does 
not purge the cache (argument disable_purge is 1):
void
TAO_Transport::close_connection (void)
{
  ACE_Event_Handler * eh = this->invalidate_event_handler ();
  this->close_connection_shared (1, eh);
}

This might be a misunderstanding of the meaning of disable_purge since:

void
TAO_Transport::close_connection_no_purge (void)
{
  ACE_Event_Handler * eh = this->invalidate_event_handler ();
  this->close_connection_shared (0, eh);
}

Here 0 is passed when 1 should be used to avoid the purge if the name of the 
function is right. What do you think? 
Comment 6 Nanbor Wang 2002-09-15 08:28:52 CDT
Adding myself to the CC list
Comment 7 Ossama Othman 2002-09-24 12:08:25 CDT
I'm not sure if this bug is mine, but I'll accept it for now.

Cleared "Target" fields.  They are only meant to be used for cross-compiled targets.
Comment 8 Ossama Othman 2002-10-24 17:38:48 CDT
*** Bug 1341 has been marked as a duplicate of this bug. ***
Comment 9 Knut-Håvard Aksnes 2002-11-08 03:34:45 CST
Adding myself to the CC list
Comment 10 Ossama Othman 2002-11-13 18:21:01 CST
I haven't been able to reproduce this problem using the version of TAO_SSLIOP in
our CVS repository.  I get a CORBA::COMM_FAILURE as expected.  However, I didn't
ran both the client and server on the same Linux box, but I don't see how that
would make a difference (yet).
Comment 11 Christophe Vedel 2002-11-29 13:31:58 CST
Indeed I cannot reproduce the problem in version 1.2.6. 

I guess the change in TAO_Transport::close_connection_shared from:

void
TAO_Transport::close_connection_shared (int disable_purge,
                                        ACE_Event_Handler * eh)
{
  // Purge the entry
  if (!disable_purge && this->cache_map_entry_ != 0)
    {
      this->transport_cache_manager ().purge_entry (this->cache_map_entry_);
    }

...

to

void
TAO_Transport::close_connection_shared (int purge,
                                        TAO_Connection_Handler * eh)
{
  // Purge the entry
  if (purge)
    {
      this->transport_cache_manager ().purge_entry (this->cache_map_entry_);
    }

....
without changing the first parameter in the calls did the trick. purge and 
disable_purge were exchanged by mistake.
Thanks for the fix.

Comment 12 Nanbor Wang 2002-11-30 22:42:49 CST
From users feedback this seems to have been fixed. Here is the relevant 
ChangeLog entry  for that.

Thu Nov 14 14:33:21 2002  Balachandran Natarajan  <bala@isis-server.isis.vanderb
ilt.edu>