1354 – About TAO Connection Close

Bug 1354 - About TAO Connection Close

Summary: About TAO Connection Close

Status:	RESOLVED DUPLICATE of bug 1020

Alias:	None

Product:	TAO
Classification:	Unclassified
Component:	ORB (show other bugs)
Version:	1.2.5
Hardware:	x86 Windows 2000

Importance:	P3 major
Assignee:	DOC Center Support List (internal)

URL:

Depends on:
Blocks:

Reported:	2002-11-04 01:52 CST by Wanjia
Modified:	2002-11-08 21:41 CST (History)
CC List:	0 users

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Wanjia 2002-11-04 01:52:30 CST

TAO VERSION: 1.2.5 (CVS 2002-10-31)
ACE VERSION: 5.2.5

 HOST MACHINE, TARGET MACHINE and OPERATING SYSTEM: WINDOWS 2000 SERVER
 COMPILER NAME AND VERSION (AND PATCHLEVEL): MS VC++ 6.0 SP4

 AREA/CLASS/EXAMPLE AFFECTED:
 0xc0000005 Access Voilation

 DOES THE PROBLEM AFFECT:
 COMPILATION?
 No
 LINKING?
 No
 EXECUTION?
 Yes

SYNOPSIS:
I run one TAO test program under this version. And I found there are some 
defects during connection closing. I use these strategies in svc.conf:

    static Server_Strategy_Factory "-ORBconcurrency thread-per-connection"
    static Client_Strategy_Factory "-ORBTransportMuxStrategy exclusive" 

If the server is down, the client process receive the TCP connection 
close exception in select, it call the Connection_Handler::close_connection_eh 
and also hold the pointer to IIOP_Connection_Handler in the ORB_run's 
thread( assumed id is 1). But if one of client thread (assumed id is 2) 
return from the wait_for_event because the state change by the 
send_connection_closed_notifications 
in the handle_input_i or Connection_Handler::close_connection_eh
(I use leader_follower strategy); it call raise_service_comm_failure and 
then come in IOP_Connection_Handler::close_connection, and this connection 
handler is same as thread 1, there are some problem:
1) If somewhere thread 1 is blocked, thread 2 decrement refcount of this 
 conneciton_handler in "transport(0)" and the decr_refcount in the end of 
 close_connection_eh, the refcount will be 0, so thread 2 delete the 
connection_handler, 
 but the thread 1 have been in close_connection_eh, if it read/write some 
 member of the conneciton_handler then it is accessing some deleted memory, 
 the behavior will be undefined.
2) The behavior will depend on the execution sequence of the threads,  
 maybe it will be different in two CPU's machine on which these thread 
 bound to different LWP/kernel threads.
3) Even if the thread 1 will always execute in front of thread 2, this will
 can emerge these circumstances that when thread 1 call "delete this" 
 thread 2 is processing some data of "this".


I think it is caused by a few threads accessing the shared connection_handler's 
close_connection at the same time which will "delete this", so we should 
insure only one thread will call close_connection. And both of the 
close_connection
and raise_service_comm_failure process the shared XXX_transport, this should be
synchronized strictly too.

I consider whether we can modify it like this: 
1) Not call send_connection_closed_notifications in the 
TAO_Transport::handle_input_i
   or others input process methods, because if the connection has broken, we can
   get it from the select system call and process this circumstance in the 
   handle_close. This will keep the follower thread wait until the connection 
closed.
2) Not call state_changed in the close_connection_eh before transport(0), 
because 
   transport(0) will call it. After transport(0) we will wakeup the follower 
thread
   but it can not get the event_handler pointer because it has been set zero in 
   transport(0), and the transport is protected by ref_count, so it will be OK.

Maybe these thougths are incomplete or incorrect, I hope you can tell me how to 
solve
this problem or whether my solution is viable. Thanks!!

	Wanjia

Comment 1 Tao Lu 2002-11-08 20:16:20 CST

Same bug as 1020.

Comment 2 Nanbor Wang 2002-11-08 21:41:55 CST

Duplicate of bug 1020

*** This bug has been marked as a duplicate of 1020 ***