Bug 3178

Summary: ACE_Dev_Poll_Reactor fails remove part of Reactor_Remove_Resume_Test
Product: ACE Reporter: Johnny Willemsen <jwillemsen>
Component: ACE CoreAssignee: DOC Center Support List (internal) <tao-support>
Status: RESOLVED FIXED    
Severity: normal CC: shuston
Priority: P3    
Version: 5.6.2   
Hardware: All   
OS: All   
Bug Depends on:    
Bug Blocks: 3179, 3188    

Description Johnny Willemsen 2007-12-24 04:58:30 CST
The dev_poll reactor currently doesn't suspend/resume event handlers, for TAO we do need this. We can add a TP_Dev_Poll reactor, but it is nicer to have a strategy to the dev_poll reactor
Comment 1 Johnny Willemsen 2007-12-24 04:59:41 CST
added block
Comment 2 Steve Huston 2008-01-03 18:25:12 CST
Why do you think ACE_Dev_Poll_Reactor doesn't suspend/remove handlers?
Comment 3 Johnny Willemsen 2008-01-04 02:10:13 CST
Because there have been reports from users that it doesn't do so and also with TAO we do see problems that multiple threads try to read from the same socket at the same time
Comment 4 Steve Huston 2008-01-04 13:50:00 CST
Well, it does do suspend/resume. It has a problem in the Reactor_Remove_Resume_Test, which may explain the TAO issues. If you think so, let's change this title. I figured out why the test fails but don't have a fix yet.
Comment 5 Johnny Willemsen 2008-01-04 13:54:52 CST
Feel free to update the summary to the problem you think is there
Comment 6 Steve Huston 2008-01-04 14:40:14 CST
Reactor_Remove_Resume_Test fails the remove part when using ACE_Dev_Poll_Reactor. This failure occurs because when a callback returns -1 to close the handler, the reactor calls back the handler currently registered for the handle that triggered the callback, not the handler that was called back. This error occurs when a particular handle ends up being registered to a different handler (such as when a handle is closed then reused quickly), the called back handler no longer controls the handle, and the callback returns -1.
Comment 7 Johnny Willemsen 2008-01-05 11:49:52 CST
The problem for TAO is probably the fact that the dev_poll reactor doesn't check the event_handler->resume_handler() operation. TAO does resume handlers as application in some cases, the reactor should do that. The TP_Reactor is the only reactor that uses this method (as a note in the header of eh says), but the dev_poll should also check this method and not always resume the handle
Comment 8 Johnny Willemsen 2008-01-06 00:54:29 CST
I think my original post was unclear, the dev poll reactor does have suspend/resume methods, but the event handler is not automatically suspended when it is used by one thread, as done by the tp_reactor. This is what we need for TAO and I can't find this back in the code. 
Comment 9 Steve Huston 2008-01-07 11:10:39 CST
Ah, I see now. What is the factor that made addition of a manual-resume hack necessary for ACE_TP_Reactor? The behavior of ACE_Dev_Poll_Reactor need not be level-triggered, if that would help.

I would hate to hack in a capability just because ACE_TP_Reactor does it that way. Since I don't know the innards of TAO and why this is an important feature, I need your help to understand the rationale and whether or not there's a better way to address it.
Comment 10 Johnny Willemsen 2008-01-07 11:22:28 CST
(In reply to comment #9)
> Ah, I see now. What is the factor that made addition of a manual-resume hack
> necessary for ACE_TP_Reactor? The behavior of ACE_Dev_Poll_Reactor need not be
> level-triggered, if that would help.

TAO is written in such a way that only one thread handles a certain connection at a moment. Also TAO does indicate to the reactor who resumes the handle, see bug 3188. 

> I would hate to hack in a capability just because ACE_TP_Reactor does it that
> way. Since I don't know the innards of TAO and why this is an important
> feature, I need your help to understand the rationale and whether or not
> there's a better way to address it.

I was more thinking in ways of a policy to control the suspend/resume, not a full new class. The same could maybe be done for the TP_Reactor, change the select reactor to have a policy that controls suspend/resume behaviour
Comment 11 Steve Huston 2008-05-30 12:19:16 CDT
As further background, during a discussion with Bala I asked why TAO needs the varying suspend/resume behavior. Here is his response:

Resume() stuff is/was necessary for the following reasons:

(1) Nested Upcall -- When a incoming call (from peer A) to servant (in
peer B) makes an outbound call to peer A, which in turn (from Peer A)
makes a call to peer B, the ORB can deadlock. How? Peer A can use the
same connection to send a request back to peer B. If the connection is
in suspended state, peer B cannot see the message. By resuming the
connection we could get around this deadlock

(2) There were select()'isms in TAO code. I am not sure how much of them
have been removed. For example, there used to be paths where a thread
can pick up parts of the message that it requires, leaving rest of the
message in the socket. The thread that picks up part of the message can
release the connection for another thread to be sent down to the
handler. I know TAO fixed some of this, but I am not 100% sure whether
all the select()'isms have been removed. TAO should have removed them to
work well with SSL. But I am not sure how much SSLIOP has been tested
with TAO to be confident enough.
Comment 12 Steve Huston 2009-06-11 20:22:41 CDT
Fixed: Fri Jun 12 00:45:09 UTC 2009  Steve Huston  <shuston@riverace.com>