Bug 1816 - AMI Buffering test with timeout fails on Windows
Summary: AMI Buffering test with timeout fails on Windows
Status: REOPENED
Alias: None
Product: TAO
Classification: Unclassified
Component: ORB (show other bugs)
Version: 1.4.1
Hardware: x86 Windows NT
: P1 major
Assignee: DOC Center Support List (internal)
URL:
Depends on:
Blocks:
 
Reported: 2004-05-12 06:39 CDT by Johnny Willemsen
Modified: 2007-05-01 07:49 CDT (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Johnny Willemsen 2004-05-12 06:39:11 CDT
The TAO_ROOT/tests/AMI_Buffering tests fails when timeouts are used (start 
client with -t). The strage thing is that the test fails without debugging, at 
level 1 and 2 but at level 3 and higher the test runs. I am thinking about a 
debug line that has a side effect due to which the test runs on level 3 or 
higher. 

When running the test without debugging or level 1 or 2 I get the following 
errors in the log:
TAO (344|240) - Connection_Handler[264]::close_connection, purging entry from 
cache
TAO (344|240) - Connection_Handler[264]::close_connection, removing from the 
reactor
TAO (344|240) - Connection_Handler[264]::close_connection, cancel all timers
(344|240) Got except
(344|240) EXCEPTION, Reply handler receive_data: 
system exception, ID 'IDL:omg.org/CORBA/COMM_FAILURE:1.0'
Unknown vendor minor code id (0), minor code = 0, completed = MAYBE

At a certain moment in time the connection are closed and then we get a lot of 
failures. 

Also during testing I sometimes get the following assert:
ACE_ASSERT: file .\Transport.cpp, line 150 assertion failed for 'this->head_ == 
0'.Aborting...

I have added an ACE_ASSERT to ACE_EH_Dispatch_Info::set() to check whether a 
valid handle and event_handler* is passed, sometimes I get the following assert:
ACE_ASSERT: file ..\ace/TP_Reactor.i, line 33 assertion failed 
for 'event_handler != 0'.Aborting...
Comment 1 Johnny Willemsen 2004-05-12 06:40:05 CDT
Found the following in tao/Transport.cpp, there is an ACE_ERROR_RETURN that 
should be an ACE_ERROR with a separate return. Diff to fix below:

===================================================================
RCS file: /project/cvs-repository/ACE_wrappers-
repository/TAO/tao/Transport.cpp,v
retrieving revision 1.100
diff -u -u -r1.100 Transport.cpp
--- Transport.cpp	11 May 2004 13:00:31 -0000	1.100
+++ Transport.cpp	12 May 2004 11:27:18 -0000
@@ -1157,10 +1157,12 @@
       if (retval == -1)
         {
           if (TAO_debug_level > 2)
-            ACE_DEBUG ((LM_DEBUG,
-                        "TAO (%P|%t) - Transport[%d]::handle_input, "
-                        "error while parsing the head of the queue\n",
-                        this->id()));
+            {
+              ACE_DEBUG ((LM_DEBUG,
+                          "TAO (%P|%t) - Transport[%d]::handle_input, "
+                          "error while parsing the head of the queue\n",
+                          this->id()));
+            }
         }
 
       return retval;
@@ -1333,10 +1335,12 @@
       if (retval == -1)
         {
           if (TAO_debug_level > 2)
-            ACE_DEBUG ((LM_DEBUG,
-                        "TAO (%P|%t) - Transport[%
d]::parse_incoming_messages, "
-                        "error in incoming message\n",
-                        this->id ()));
+            {
+              ACE_DEBUG ((LM_DEBUG,
+                          "TAO (%P|%t) - Transport[%
d]::parse_incoming_messages, "
+                          "error in incoming message\n",
+                          this->id ()));
+            }
 
           return -1;
         }
@@ -1836,12 +1840,12 @@
     {
       if (TAO_debug_level)
         {
-          ACE_ERROR_RETURN ((LM_ERROR,
-                             "TAO (%P|%t) - Transport[%
d]::process_parsed_messages, "
-                             "received MessageError, closing connection\n",
-                             this->id ()),
-                            -1);
+          ACE_ERROR ((LM_ERROR,
+                     "TAO (%P|%t) - Transport[%d]::process_parsed_messages, "
+                     "received MessageError, closing connection\n",
+                     this->id ()));
         }
+      return -1;
     }
 
   // If not, just return back..

***** CVS exited normally with code 1 *****
Comment 2 Johnny Willemsen 2004-05-12 07:46:27 CDT
The problem I see looks very much like 1815. A backtrace from the problem

In the method below we do a send and this returns zero.

int
TAO_Transport::drain_queue_helper (int &iovcnt, iovec iov[])
{
  size_t byte_count = 0;

  ssize_t retval =
    this->send (iov, iovcnt, byte_count); <== returns zero

We return now -1

We are now coming back in 
int
TAO_Transport::drain_queue_i (void)
{

Becase we return a value not equal to 0, we also return -1

We go back to
int
TAO_Transport::drain_queue (void)
{
  ACE_GUARD_RETURN (ACE_Lock, ace_mon, *this->handler_lock_, -1);
  int retval = this->drain_queue_i ();

We return also -1

Come to
int
TAO_Transport::handle_output (void)
{

Return also here -1

Come to 
int
TAO_Connection_Handler::handle_output_eh (
    ACE_HANDLE, ACE_Event_Handler * eh)

We leave the handle suspended and return -1

And now we are back in 
int
TAO_IIOP_Connection_Handler::handle_output (ACE_HANDLE handle)
{

And we call
      this->close_connection ();

This gives the results that the reply handlers get an exception. 

So we have another send call returning zero. Have to debug further why it 
returns 0.

Comment 3 Johnny Willemsen 2004-05-12 08:03:37 CDT
I added a debug line to printout errno and %m when send returns zero
int
TAO_Transport::drain_queue_helper (int &iovcnt, iovec iov[])
{

The output:
TAO (400|1580) - Transport[10035]::drain_queue_helper, send() returns 0, errno 1
776,  Resource temporarily unavailable
Comment 4 Johnny Willemsen 2004-05-12 08:30:23 CDT
Fixed by applying the patch below:

===================================================================
RCS file: /project/cvs-repository/ACE_wrappers-
repository/ace/OS_NS_sys_socket.inl,v
retrieving revision 1.13
diff -u -u -r1.13 OS_NS_sys_socket.inl
--- OS_NS_sys_socket.inl	12 May 2004 06:34:59 -0000	1.13
+++ OS_NS_sys_socket.inl	12 May 2004 13:15:20 -0000
@@ -667,6 +667,7 @@
   if (result == SOCKET_ERROR)
     {
       ACE_OS::set_errno_to_wsa_last_error ();
+      return -1;
     }
 # else
   for (int i = 0; i < n; ++i)

Comment 5 Johnny Willemsen 2004-05-12 08:42:54 CDT
reopening, not everything is working still.
Comment 6 Johnny Willemsen 2004-05-12 09:18:58 CDT
I now see another problem, now in the method below the connection is closed 
before all replies from the server are received

int
TAO_IIOP_Connection_Handler::handle_input (ACE_HANDLE h)
{
  int result =
    this->handle_input_eh (h, this);

  if (result == -1)
    {
      this->close_connection ();
      return 0;
    }

  return result;
}
Comment 7 Johnny Willemsen 2004-05-12 09:32:29 CDT
In the code below, in case of a failure the recv return 0. But I have to 
mention now that at the moment this happens in the client, the server 
shutdowns, so maybe the problem is now in the server. This also not happens 
always, sometimes the tests runs without problems.

ssize_t
TAO_IIOP_Transport::recv (char *buf,
                          size_t len,
                          const ACE_Time_Value *max_wait_time)
{
  ssize_t n = this->connection_handler_->peer ().recv (buf,
                                                       len,
                                                       max_wait_time);

Comment 8 Johnny Willemsen 2007-05-01 07:49:33 CDT
changed severity