Bug 1031 - TP_Reactor enhancements
Summary: TP_Reactor enhancements
Status: REOPENED
Alias: None
Product: ACE
Classification: Unclassified
Component: ACE Core (show other bugs)
Version: 5.1.19
Hardware: All All
: P3 enhancement
Assignee: DOC Center Support List (internal)
URL:
: 1074 (view as bug list)
Depends on:
Blocks: 982 1038 943 1020
  Show dependency tree
 
Reported: 2001-09-14 18:23 CDT by Nanbor Wang
Modified: 2009-12-29 14:53 CST (History)
2 users (show)

See Also:


Attachments
One possible implementation and use of event handler ref counting to prevent the race condition of bugs 1074 and 1020. (14.65 KB, patch)
2001-10-29 19:39 CST, gclark
Details
Bug fix in event handler ref counting implementation (2.34 KB, patch)
2001-11-08 15:37 CST, gclark
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Nanbor Wang 2001-09-14 18:23:30 CDT
We need to do the following enhancements to the TP_Reactor after the release 
goes out.

- The TPR (Thread Pool Reactor) does not guarantee fairness to the handle sets. 
It gives unfair advantage to the notify handle. Further it gives more weightage 
to handles that appear first in the handle set. If we happen to keep getting 
lots of events on say handle 5, 6, 7 & 8 and there is only one thread in the 
pool. The thread will first dispatch events on handle 5. After dispatching 5 if 
the thread happens to call select again, and if there is a event on 5,  the 
thread dispatches 5 again, thereby making the events on 6, 7 & 8  to wait. We 
need to remedy this. (The description is not accurate to the word but 
represents an idea of the problem that we have). This affects Oneway_Buffering 
tests in TAO.

- The TPR's (or for that matter the Reactor's) handle set needs to be reference 
counted.  We see race conditions like the one in bug 1020. The problem comes 
when one of threads is making an upcall to the handler and another thread tries 
to remove the handler from the Reactors handle set. The race is between one of 
the threads (the leader) in the reactor holding the reference to the handler, 
which is being dispatched, and another thread potentially trying to remove the 
same handler from the reactor.

- We may want to have a Single threaded reactor in ACE derived from TPR (for 
TAO specifically) instead of using the select's ST Reactor. 

Others if you have ideas please put them here.  I am not adding Don's idea. I 
will request him to add that.
Comment 1 Don Hinton <dhinton@gmx.net> 2001-09-25 04:39:51 CDT
The ACE_Select_Reactor family of classes, which included TPR, use ACE_Token to implement a FIFO thread scheduling mechanism.  But this isn't always the most efficient.  Therefore, adding the ability to select a thread scheduling strategy, e.g., LIFO, either by deriving from or strategizing ACE_Token, could increase performance.  
Comment 2 Don Hinton <dhinton@gmx.net> 2001-09-25 04:47:29 CDT
Added myself to cc list
Comment 3 Nanbor Wang 2001-09-25 15:59:23 CDT
Accepting this one too..
Comment 4 Nanbor Wang 2001-10-23 14:25:31 CDT
This is after a long discussion with Irfan on this.  Looks like the enhanced
reactor should have a way to dispatch to the handles based on priority of the
handle/connection.  A naive dispatching mechanism would hurt RTCORBA a lot.  For
example  the reactor could manage 3 connections opened up by threads of
different priorities  from the client. If the messages are handled naively we
could easily be servicing the lowest priority connection. (This seems definitely
possible if the server thread has a zero priority and uses client propagated
policy model). 

A solution for this problem is to allow setting of priorities to the handles so
that dispatching can be done after reodering the handlesets based on priority.
The methods can be on the Reactor so that an application would be able to change
the priority of the handle at run time.

Irfan , if you would like to add more information, please go ahead.
Comment 5 Nanbor Wang 2001-10-23 19:10:57 CDT
*** Bug 1074 has been marked as a duplicate of this bug. ***
Comment 6 Nanbor Wang 2001-10-23 19:11:43 CDT
Look at bug 1074 for a test case 
Comment 7 gclark 2001-10-29 19:39:08 CST
Created attachment 94 [details]
One possible implementation and use of event handler ref counting to prevent the race condition of bugs 1074 and 1020.
Comment 8 gclark 2001-10-29 19:45:47 CST
If it's of any interest, the previous attachment shows changes that we 
are using, that add reference counting to ACE_Event_Handler, and that use these
in place of the pending_upcalls counter in the TP_Reactor to control the
lifetime of IIOP connection handlers.  This fixes the race condition in bug 1074
(and presumably that in bug 1020 as well, although we have not tested the latter).

Here's a summary of the changes:

  ACE_Event_Handler changes:
    - Add add_ref() and remove_ref() methods.
      add_ref() increments a reference count, and remove_ref()
      decrements the count, deleting the event handler 
      object when the counter reaches zero.
    - Initialize the ref count to 1 in the constructor.

  ACE_TP_Reactor changes:
    - Change handle_socket_events() so that it 
      increments the event handler's ref count before 
      dispatching to its callback method, and decrements
      it afterwards.

  TAO_IIOP_Connection_Handler changes:
    - Change handle_close() to call handle_close_i() unconditionally, 
      rather than checking pending_upcalls first.

    - Change handle_close_i() to call remove_ref(this) instead of 
      directly deleting itself.  The destruction of the event 
      handler and associated Transport object will be done by the 
      connection handler's destructor.
  
    - In handle_input(), remove the increment, decrement, and check
      of the pending_upcalls counter.  Don't call handle_close_i().

The changes apply to the following files:
  ACE_wrappers/ace/Event_Handler.h v 4.41
  ACE_wrappers/ace/Event_Handler.cpp v 4.19
  ACE_wrappers/ace/TP_Reactor.cpp v 4.47
  ACE_wrappers/TAO/tao/IIOP_Connection_Handler.cpp v 1.46
Comment 9 gclark 2001-11-08 15:37:09 CST
Created attachment 95 [details]
Bug fix in event handler ref counting implementation
Comment 10 gclark 2001-11-08 15:41:17 CST
The previous attachment is diffs to ace/Select_Reactor_Base.cpp that
prevent dispatch_notify() from closing the handle twice if the event
handler callback detects an error.

This fix applies to the event handler mods of attachment 94 [details].
Comment 11 Nanbor Wang 2002-06-18 22:45:47 CDT
Another feature which is needed and which has not been captured explicitly is 
this

- When the Reactor holds a refcounted handler in its internal map, the handler 
would not be deleted till all the references to the handler goes off. In other 
words the external references to the handler should be zero. In the mean time 
if a connection is accepted for the same handle (this is very much possible), 
the handler shoudl not be added to the internal maps immediately. The addition 
should be delayed to make sure that all the references to the original handler 
go away.

Comment 12 Nanbor Wang 2002-06-19 11:16:47 CDT
The TP_Reactor does not have signal handling capabilities. Needs to be 
implemented.
Comment 13 Nanbor Wang 2002-06-19 15:46:55 CDT
Yet another use case that the enhancements should take care is the following

- The reactor code releases the guard and then uses an event handler.  Yes, the 
handler is suspended to avoid multiple threads from calling handle_XXX(), but 
there is no guarantee that the application will not call remove_handler(..., 
DONT_CALL) and then remove the handler.  Nor can the application find out if 
the handle is suspended and therefore in the middle of an upcall.
    In other words, the TP_Reactor is at risk of using released memory, small
risk, but a real one.

The solution to this reference counting and a new memory management scheme.  
Comment 14 Irfan Pyarali 2003-08-14 14:09:51 CDT
Fixed. This ChangeLogTag: 

Mon Jul 07 18:00:38 2003  Irfan Pyarali  <irfan@oomworks.com>

describes the proper fixes to these problems.
Comment 15 Nanbor Wang 2004-10-14 14:23:04 CDT
Reopening the bug since one of the points is not addressed
Comment 16 Steve Huston 2009-12-29 14:53:19 CST
(In reply to comment #15)
> Reopening the bug since one of the points is not addressed

In particular, the fairness issue is unresolved, correct?