Please report new issues athttps://github.com/DOCGroup
By inspection all the parts of this bug are still present in the DOC group trunk. - server-side sockets are non-blocking - blocking flushing strategy just runs a loop around handle_output - the OS can return -1/EWOULDBLOCK at any time for a non-blocking socket Therefore TAO can be stuck in this loop at 100% CPU for an arbitrary length of time. Proposed solution: TAO_Block_Flushing_Strategy::flush_transport() and flush_message() should put the socket in blocking mode and restore the previous mode before they return. This could be done in the "guard" style similar to mutex lock/unlock.
Created attachment 1167 [details] Proposed patch
get_blocking_io can be const
(In reply to comment #2) > get_blocking_io can be const > OK, I changed it in my local workspace.
After the fix for this bug was reverted the fix for 3682 doesn't work any more. The problem is that send can return on solaris with errno=EAGAIN and in this case drain_queue_helper() returns 0 which leads to handle_output() returns 0 too and the reactor doesn't call this handle any more.
(In reply to comment #4) > After the fix for this bug was reverted the fix for 3682 doesn't work any more. The fix for this bug was never committed, so it was also never reverted. > The problem is that send can return on solaris with errno=EAGAIN and in this > case drain_queue_helper() returns 0 which leads to handle_output() returns 0 > too and the reactor doesn't call this handle any more. That should be fixed once this fix is indeed committed. Do you see the 3682 regression test failing now (after June 18)?
(In reply to comment #5) > (In reply to comment #4) > > After the fix for this bug was reverted the fix for 3682 doesn't work any more. > > The fix for this bug was never committed, so it was also never reverted. Sorry, I didn't read carefully the changelog for commit 85701 which mentions this bug. Actually the above commit broke the fix for 3682. > > > The problem is that send can return on solaris with errno=EAGAIN and in this > > case drain_queue_helper() returns 0 which leads to handle_output() returns 0 > > too and the reactor doesn't call this handle any more. > > That should be fixed once this fix is indeed committed. Do you see the 3682 > regression test failing now (after June 18)? > 3682 doesn't have own test but it's reproduced by Single_Read and AMH_Oneway tests. They are failing now.
(In reply to comment #6) > (In reply to comment #5) > > (In reply to comment #4) > > > After the fix for this bug was reverted the fix for 3682 doesn't work any more. > > > > The fix for this bug was never committed, so it was also never reverted. > > Sorry, I didn't read carefully the changelog for commit 85701 which mentions > this bug. Actually the above commit broke the fix for 3682. > > > > > > The problem is that send can return on solaris with errno=EAGAIN and in this > > > case drain_queue_helper() returns 0 which leads to handle_output() returns 0 > > > too and the reactor doesn't call this handle any more. > > > > That should be fixed once this fix is indeed committed. Do you see the 3682 > > regression test failing now (after June 18)? > > > > 3682 doesn't have own test but it's reproduced by Single_Read and AMH_Oneway > tests. They are failing now. > OK, I think those should be fixed once I commit this patch -- which should be today. I will be checking all of the tests once I commit it but please let me know if I miss something.
After discussing this with a colleague I'm going to try a different approach. The problem with the current proposed patch is that it sets the socket to blocking mode which will also impact the read side of the socket. The read side is registered with the reactor, so if the flushing strategy were to set blocking mode in between the reactor noticing the handle is ready and handle_input making the read call, that read could block (depending on how much data is available) and the thread would be stuck in handle_input. The new approach would be to use ACE::handle_write_ready() in the blocking flushing strategy. It requires the ACE_HANDLE so it can't be used directly but will need to delegate to the connection handler. I'll start working on the code for this.
Created attachment 1169 [details] Revised patch, using the new approach
(In reply to comment #9) > Created an attachment (id=1169) [details] > Revised patch, using the new approach > I checked this patch together with 3682. On solaris those two Single_Read and AMH_Oneway do work. I'm trying to check other tests I selected for testing 3682 on solaris.
With this patch and maximum debug level I get sigill on solaris in line Transport.cpp:1104 ACE_DEBUG ((LM_DEBUG, ACE_TEXT ("TAO (%P|%t) - Transport[%d]::drain_queue_i, ") ACE_TEXT ("helper retval = %d\n"), this->id (), retval)); and the same thing is in line Transport.cpp:1077 ACE_DEBUG ((LM_DEBUG, ACE_TEXT ("TAO (%P|%t) - Transport[%d]::drain_queue_i, ") ACE_TEXT ("helper retval = %d\n"), this->id (), retval)); retval is object now not int.
(In reply to comment #11) > With this patch and maximum debug level I get sigill on solaris in line > Transport.cpp:1104 > > and the same thing is in line Transport.cpp:1077 > > retval is object now not int. > Thanks, I'll fix those.
Fri Jun 26 20:41:53 UTC 2009 Adam Mitz <mitza@ociweb.com> * tao/Block_Flushing_Strategy.h: * tao/Block_Flushing_Strategy.cpp: * tao/Connection_Handler.h: * tao/Connection_Handler.inl: * tao/Connection_Handler.cpp: * tao/IIOP_Connection_Handler.h: * tao/IIOP_Connection_Handler.cpp: * tao/Transport.h: * tao/Transport.cpp: This resolves bug #3704. TAO_Transport::handle_output() now uses an enum return value to distinguish the EWOULDBLOCK case from other normal or error conditions. The Block_Flushing_Strategy checks for EWOULDBLOCK and calls a new method, handle_write_ready(), on the Connection Handler. The Connection Handler calls ACE::handle_write_ready() passing the specific handle. * tao/Strategies/DIOP_Connection_Handler.h: * tao/Strategies/DIOP_Connection_Handler.cpp: * tao/Strategies/SCIOP_Connection_Handler.h: * tao/Strategies/SCIOP_Connection_Handler.cpp: * tao/Strategies/SHMIOP_Connection_Handler.h: * tao/Strategies/SHMIOP_Connection_Handler.cpp: * tao/Strategies/UIOP_Connection_Handler.h: * tao/Strategies/UIOP_Connection_Handler.cpp: * orbsvcs/orbsvcs/HTIOP/HTIOP_Connection_Handler.h: * orbsvcs/orbsvcs/HTIOP/HTIOP_Connection_Handler.cpp: * orbsvcs/orbsvcs/PortableGroup/UIPMC_Connection_Handler.h: * orbsvcs/orbsvcs/PortableGroup/UIPMC_Connection_Handler.cpp: * orbsvcs/orbsvcs/PortableGroup/UIPMC_Mcast_Connection_Handler.h: * orbsvcs/orbsvcs/PortableGroup/UIPMC_Mcast_Connection_Handler.cpp: * orbsvcs/orbsvcs/SSLIOP/SSLIOP_Connection_Handler.h: * orbsvcs/orbsvcs/SSLIOP/SSLIOP_Connection_Handler.cpp: Added handle_write_ready() to the non-IIOP protocols.