Please report new issues athttps://github.com/DOCGroup
When the client closes its connections, it sends an EOF to the server. The sever reactor gets an event for that handler. It first suspends the handler and then dispatches a thread to handle_input(). Some where down the lane, a read() is done on the handle and automatically the handle is resumed. At this point another thread CAN enter the reactor and since the handle being closed has been resumed, the reactor will add it to its fd_set. As expected the reactor gets an event on that handle and dispatches someone else to handle that event. In rare conditions, this will lead to an appliation core as the second thread assumes that an event trigger == valid handle. What this patch (seee below) does is prevent the closing handler from being resumed. Thus it never does get added to the reactor fd_set and the possibility of getting a nil handler is removed. Index: Connection_Handler.cpp =================================================================== RCS file: /project/cvs-repository/ACE_wrappers-repository/TAO/tao/Connection_Handler.cpp,v retrieving revision 1.62 diff -u -r1.62 Connection_Handler.cpp --- Connection_Handler.cpp 2 Nov 2005 07:13:02 -0000 1.62 +++ Connection_Handler.cpp 2 Feb 2006 19:45:36 -0000 @@ -282,6 +282,8 @@ t_id, handle, h, return_value)); } + if (return_value == -1) + resume_handle.set_flag (TAO_Resume_Handle::TAO_HANDLE_LEAVE_SUSPENDED); return return_value; } This race is easy to verify. Start up any multi-threaded server with debuglevel == 10. When a client disconnects, the server log will show two threads trying to close the same handle. Post-patch only one connection closure is logged. Ciju john_c@ociweb.com
Looks fine for me, does not interfere with 2313.
The fix was tested and added in Tue Mar 14 08:12:55 UTC 2006 Ciju John <john_c@ociweb.com>
this should be rechecked with the findings of bug 2395
Reopening this, seems not be 100%. Ciju- I may be naive here.. [...] > Heres how things happened before the patch. > > 1. Thread 1 enters handle_input() for handle 10 (10 is > suspended) 2. Thread 1 closes handle 10. Have things changed recently? TAO never used to close handles until the handler was destroyed. Was that changed? > 3. Thread 2 accepts connection and gets handle 10 4. Thread 2 > enters handle_input() for handle 10 (10 is suspended) 5. > Thread 1 resumes handle 10. > 6. you are ... This shouldn't happen. Please see my last mail to Steve. TAO used to prevent the problems (that I point out in Steve's emil) by making sure that the handle is not closed until the handler is alive. Handler will be alive until the refcount goes down to 0. At that point of time, things used to be consistent within the reactor that this problem could be moot. What changed inbetween? > My patch will prevent step 5. I don't see how any ACE fixes > will prevent that. Please see above. Thanks Bala
I just noticed that this report has been reopened. Heres my response to the last comments on this. >Ciju- > > I may be naive here.. >[...] >> Heres how things happened before the patch. >> >> 1. Thread 1 enters handle_input() for handle 10 (10 is >> suspended) 2. Thread 1 closes handle 10. >Have things changed recently? TAO never used to close handles until the >handler was destroyed. Was that changed? You are right the handle gets closed. However this is the sequence in which things happened: handle_input() > suspend() > read() > unsuspend() > destroy handle() The unsuspesion was automatically after every read irrespective of if the connection was being closed down. This is arace condition. >> 3. Thread 2 accepts connection and gets handle 10 4. Thread 2 >> enters handle_input() for handle 10 (10 is suspended) 5. >> Thread 1 resumes handle 10. >> 6. you are ... > >This shouldn't happen. Please see my last mail to Steve. TAO used to >prevent the problems (that I point out in Steve's emil) by making sure >that the handle is not closed until the handler is alive. Handler will >be alive until the refcount goes down to 0. At that point of time, >things used to be consistent within the reactor that this problem could >be moot. What changed inbetween? Handle resumption doesn't have anything to with whether of not the handle has been closed. The TAO_Resume_Handle object in TAO_Connection_Handler::handle_input_internal() automatically unsuspecded the handle. At this point the handle hasn't been closed yet. By unsuspending a handle thats in the process of closing the reactor will dispatch another thread to finish the closure. Thus two threads are in a connection closure race. Since this is a race condition, I don't have a regression test for it. However we have seen this patch solve the described problem in two seperate cases. If this clarifies Balas concerns can I close this report? Ciju >> My patch will prevent step 5. I don't see how any ACE fixes >> will prevent that. >Please see above. >Thanks >Bala