Bug 2391 - Race condition in connection closure
Summary: Race condition in connection closure
Status: REOPENED
Alias: None
Product: TAO
Classification: Unclassified
Component: ORB (show other bugs)
Version: 1.4.7
Hardware: All All
: P3 normal
Assignee: DOC Center Support List (internal)
URL:
Depends on:
Blocks:
 
Reported: 2006-02-02 13:49 CST by ciju john
Modified: 2006-04-17 09:32 CDT (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description ciju john 2006-02-02 13:49:03 CST
When the client closes its connections, it sends an EOF to the server. The sever
reactor gets an event for that handler. It first suspends the handler and then
dispatches a thread to handle_input(). Some where down the lane, a read() is
done on the handle and automatically the handle is resumed. At this point
another thread CAN enter the reactor and since the handle being closed has been
resumed, the reactor will add it to its fd_set. As expected the reactor gets an
event on that handle and dispatches someone else to handle that event. In rare
conditions, this will lead to an appliation core as the second thread assumes
that an event trigger == valid handle. What this patch (seee below) does is
prevent the closing handler from being resumed. Thus it never does get added to
the reactor fd_set and the possibility of getting a nil handler is removed.

Index: Connection_Handler.cpp
===================================================================
RCS file:
/project/cvs-repository/ACE_wrappers-repository/TAO/tao/Connection_Handler.cpp,v
retrieving revision 1.62
diff -u -r1.62 Connection_Handler.cpp
--- Connection_Handler.cpp      2 Nov 2005 07:13:02 -0000       1.62
+++ Connection_Handler.cpp      2 Feb 2006 19:45:36 -0000
@@ -282,6 +282,8 @@
                   t_id, handle, h, return_value));
     }

+  if (return_value == -1)
+    resume_handle.set_flag (TAO_Resume_Handle::TAO_HANDLE_LEAVE_SUSPENDED);
   return return_value;
 }

This race is easy to verify. Start up any multi-threaded server with debuglevel
== 10. When a client disconnects, the server log will show two threads trying to
close the same handle. Post-patch only one connection closure is logged.

Ciju
john_c@ociweb.com
Comment 1 Frank.Rehberger 2006-02-02 17:47:13 CST
Looks fine for me, does not interfere with 2313.
Comment 2 ciju john 2006-03-15 08:54:38 CST
The fix was tested and added in

Tue Mar 14 08:12:55 UTC 2006  Ciju John  <john_c@ociweb.com>
Comment 3 Johnny Willemsen 2006-03-15 09:04:21 CST
this should be rechecked with the findings of bug 2395 
Comment 4 Johnny Willemsen 2006-03-17 02:59:13 CST
Reopening this, seems not be 100%.

Ciju- 

	I may be naive here..
[...]
> Heres how things happened before the patch.
> 
> 1. Thread 1 enters handle_input() for handle 10 (10 is 
> suspended) 2. Thread 1 closes handle 10.

Have things changed recently? TAO never used to close handles until the
handler was destroyed. Was that changed?

> 3. Thread 2 accepts connection and gets handle 10 4. Thread 2 
> enters handle_input() for handle 10 (10 is suspended) 5. 
> Thread 1 resumes handle 10.
> 6. you are ...

This shouldn't happen. Please see my last mail to Steve. TAO used to
prevent the problems (that I point out in Steve's emil) by making sure
that the handle is not closed until the handler is alive. Handler will
be alive until the refcount goes down to 0. At that point of time,
things used to be consistent within the reactor that this problem could
be moot. What changed inbetween?

> My patch will prevent step 5. I don't see how any ACE fixes 
> will prevent that.


Please see above. 

Thanks
Bala
 
Comment 5 ciju john 2006-04-17 09:32:14 CDT
I just noticed that this report has been reopened. Heres my response to the last
comments on this.

>Ciju- 
>
>	I may be naive here..
>[...]
>> Heres how things happened before the patch.
>> 
>> 1. Thread 1 enters handle_input() for handle 10 (10 is 
>> suspended) 2. Thread 1 closes handle 10.

>Have things changed recently? TAO never used to close handles until the
>handler was destroyed. Was that changed?

You are right the handle gets closed. However this is the sequence in which
things happened:

handle_input() > suspend() > read() > unsuspend() > destroy handle()

The unsuspesion was automatically after every read irrespective of if the
connection was being closed down. This is arace condition.

>> 3. Thread 2 accepts connection and gets handle 10 4. Thread 2 
>> enters handle_input() for handle 10 (10 is suspended) 5. 
>> Thread 1 resumes handle 10.
>> 6. you are ...
>
>This shouldn't happen. Please see my last mail to Steve. TAO used to
>prevent the problems (that I point out in Steve's emil) by making sure
>that the handle is not closed until the handler is alive. Handler will
>be alive until the refcount goes down to 0. At that point of time,
>things used to be consistent within the reactor that this problem could
>be moot. What changed inbetween?

Handle resumption doesn't have anything to with whether of not the handle has
been closed. The TAO_Resume_Handle object in
TAO_Connection_Handler::handle_input_internal() automatically unsuspecded the
handle. At this point the handle hasn't been closed yet. By unsuspending a
handle thats in the process of closing the reactor will dispatch another thread
to finish the closure. Thus two threads are in a connection closure race. Since
this is a race condition, I don't have a regression test for it. However we have
seen this patch solve the described problem in two seperate cases. If this
clarifies Balas concerns can I close this report?

Ciju

>> My patch will prevent step 5. I don't see how any ACE fixes 
>> will prevent that.


>Please see above. 

>Thanks
>Bala