Please report new issues athttps://github.com/DOCGroup
The ACE_TP_Reactor template always dispatches the lowest-numbered available socket first. If the incoming sockets are busy, the first few sockets can tie up all the available threads, and higher numbered sockets end up starved. There are two possible approaches to fixing this: 1) Fix ACE_Handle_Set and ACE_Handle_Set_Iterator to remember the last socket handled, and every time a new iterator is created it starts with the next socket. The iterator must then wrap around the end of the set and start over from the beginning, rather than just running from start to end. This would affect all the other users of ACE_Handle_Set, and might also require platform-specific implementations because Handle_Sets are substantially different between the Unix Select_Reactor and the WFMO_Reactor. 2) Modify ACE_TP_Reactor to cache the Handle_Set_Iterator in ACE_TP_Reactor::dispatch_io_set() and eventually dispatch all the active IO handles in the set, rather than re-doing the select() for every thread that enters the event dispatch method. I'd be happy to implement and contribute either solution, but I wanted to get opinions on which solution would be preferred. Irving Reid Senior Software Developer Nevex Software Technologies, Inc. irving@nevex.com
I think I've wrapped my head around the ACE_Select_Reactor and ACE_TP_Reactor code now, and there may be another related problem: As far as I can tell, all timer handlers are dispatched synchronously, while the leader thread is still holding the lock on the TP_Reactor. This looks like it would be quite a bit harder to solve. We're not using timer handlers in our application (for now), so I'm inclined not to try to fix this problem. - irving -
Reassigned to tao-support.
Accepted by tao-support.
I wanted to capture the design discussion that took place to fix this bug in bugzilla: ________________________________________ Irfan replies to Irving: > Modify ACE_TP_Reactor to cache the Handle_Set_Iterator in > ACE_TP_Reactor::dispatch_io_set() and eventually dispatch all the > active IO handles in the set, rather than re-doing the select() for > every thread that enters the event dispatch method. Right, this sounds like the right approach. Since only one thread at a time calls the select(), you will not require extra synchronization for the cached handle set. You can keep a flag that indicates whether there are outstanding handles to be dispatched and check this before doing the select(). If the flag is set, set the <dispatch_info> from a handle in the cached set. If not, do the regular select(). The only problem I can think of is that an event handler may be removed from the Reactor before its handle is dispatched from the cached set. To avoid these caching problems, reset the above flag on any changes to the handler repository (on any register_handler()/remove_handler() calls), i.e., ignore the cached set, and do a new select(). ________________________________________ Irfan replies to Carlos and Irving: > I've come up with a couple of possible implementations. Cool! > The quick hack would be to modify ACE_TP_Reactor::dispatch_io_set() > to save the handle set iterator, callback, mask, etc. > ACE_TP_Reactor::handle_events() then needs to: > > grab the lock > clear the cache if the handle set has been changed by > register_handler()/remove_handler() > re-do the select if the handle set iterator is empty > take the first handle from the set > release the lock > execute the callback Right, except you can do better by overriding dispatch_io_handlers() and storing ACE_Select_Reactor_Handle_Set rather than the ACE_Handle_Set. > This isn't too different from the existing TP_Reactor implementation. > The disadvantages of this are: > > - because dispatch_io_set() is called once each for type of select() > bitmask (write, exception, read), what will really happen is something > like: > if any sockets ready to write, handle them all in turn > else if any sockets have exceptions, handle them all in turn > else if any sockets ready to read, handle them all in turn Right. I am not too worried about this since this is the same way that the Select_Reactor has worked all along. > - this still depends on select() continuing to return true for > pending I/O operations, so that it's safe to throw away the existing > handle iterator and re-do the select. This isn't true for timer > events, so we can't use the same technique to solve the further > problem that I appended to bug #567 Right. > The other, more flexible solution would be to keep a queue of > structures something like the existing ACE_EH_Dispatch_Info, for all > types of event. This could be implemented in the Select_Reactor and > inherited by TP_Reactor. > > The rough design would be something like: > > ... > > Now, the only difference between Select_Reactor and TP_Reactor is that > Select_Reactor dispatches all the events on the queue and assumes the > event handlers will do what they want with threading, while the > TP_Reactor just takes the first event, releases the locks so the next > thread can get the next event, and then dispatches. This sounds interesting but the overhead of queue management and memory allocation maybe difficult to get right (specially in MT code) and make efficient. Carlos, why do you let go of the lock for the select() call? > Actually this would be more like this: > > grab the lock > clear the cache if the handle set has been changed by > register_handler()/remove_handler() > mark the current thread as the leader thread > release the lock > > re-do the select if the handle set iterator is empty > > grab the lock > take the first handle from the set > suspend the handle > the current thread is not the leader anymore, elect a new leader > release the lock > > execute the callback > > grab the lock > resume the handle > release the lock So here is my algorithm: 1. grab lock 2. check if cache is valid or iterator has more elements if 2 true 2a. check if handler is already suspended if 2a is true, skip current handle in cache and go to 2 2b. suspend current handle 2c. release lock 2d. dispatch 2e. reacquire lock 2f. resume handle 2g. release lock 3. if 2 is false, call select() and resume from 2 Note that you'll have to distinguish between a suspend/resume from the above code vs. a suspend/resume from the user. In the case it is from the user, you'll have to clear the cache. The cache should remain valid for the suspend/resume calls you make. ________________________________________ Irfan replies to Carlos: > > > Hmmm, good question, I just don't like to have locks held for a > > > long time, what if another thread needs to grab it, for example, > > > because it is returning from the upcall? Is it going to be > > > blocked thre forever. > > > > It'll wake up the thread calling select() through the magical > > sleep_hook of the token. > > Ahhh, that works. > > I wonder if that and the avoided context switch (to make up the > thread blocked on select()) would result in improved performance > overall.... Don't think so since you still have to wake up the thread calling select() so that it can get the updated handle set. ________________________________________
This is now fixed and uses an algorithm very close to what is suggested above. However, the event handling call chain does not wind through ACE_Select_Reactor and back any more. This fixes the issue of not re-selecting the handles when not needed. However, the issue of TP_Reactor dispatching all of the signals, timers, and notifications is still an issue. I've moved it to Bugzilla #651 to track it since the main point of this report is now resolved.