Bug 1270 - ORB / Reactor segfaults if peer dies while trying to write.
Summary: ORB / Reactor segfaults if peer dies while trying to write.
Status: RESOLVED FIXED
Alias: None
Product: TAO
Classification: Unclassified
Component: ORB (show other bugs)
Version: 1.2.3
Hardware: All All
: P3 major
Assignee: DOC Center Support List (internal)
URL:
Depends on: 1274 1305 1309
Blocks: 1202 1277
  Show dependency tree
 
Reported: 2002-08-06 10:58 CDT by Carlos O'Ryan
Modified: 2002-11-06 11:06 CST (History)
0 users

See Also:


Attachments
The regression test, tarred (90.00 KB, application/octet-stream)
2002-08-06 10:59 CDT, Carlos O'Ryan
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Carlos O'Ryan 2002-08-06 10:58:26 CDT
First of all, this bug is very similar to bug 1269, in fact, they *might* be
duplicates, but I filed them separately because:

1) The stack traces are different, so they could be actually two problems.
2) This one crashes inside the Reactor.
3) I want to know when each one is fixed, and I'm mostly interested in this one.

The problem has been dragging for a while, see bug 1202 and its brethen for more
details, but basically the ORB is running the Leader/Follower loop waiting for a
send to complete.  During a handle_output() call the peer death is detected, so
we return -1 from handle_output() to close the socket.

The socket is removed from the reactor, but *ONLY* from the write mask, it is
left there on the read mask.  Problem is the ORB closes the socket too, and
somehow the Connection_Handler is also deleted.

The Reactor then has this invalid file descriptor on the read mask (remember it
was only removed from the write mask), yet it tries to use the corresponding
Event_Handler (which was deleted) and the whole thing crashes. Multiple funny
things are going on at the same time (why does the reactor only remove the write
mask but calls handle_close()?  why does the ORB deletes the event handler
without calling remove_handler() first?  why does the
reactor::remove_handle(Event_Handler*) tries to use the get_handle() method?),
most of them way out of my expertise.

I'll be attaching a regression test shortly.
Comment 1 Carlos O'Ryan 2002-08-06 10:59:56 CDT
Created attachment 135 [details]
The regression test, tarred
Comment 2 Carlos O'Ryan 2002-08-06 11:00:21 CDT
This bug should block 1202.
Comment 3 Nanbor Wang 2002-08-20 11:55:54 CDT
Carlos seems to have a better handle on this. So assigning it to him. 
Comment 4 Carlos O'Ryan 2002-09-09 16:37:15 CDT
Bug 1305 was getting triggered inside the ORB too.  Adding a dependency.
Comment 5 Carlos O'Ryan 2002-10-22 13:29:07 CDT
Not mine anymore.  I submitted the patches and everything.  Returning to the
tao-support tarpit.
Comment 6 Nanbor Wang 2002-11-02 20:09:21 CST
Fixed! Details are available in

Mon Oct 21 22:45:02 2002  Balachandran Natarajan  <bala@isis-
server.isis.vanderbilt.edu>

I ran the tests for these for the past two days in various ways. The only 
problem that I have seen is that the test crashes because of stack overflow. 
With some aggressive testing over the past two days,  we can give some 
assurance that this is fixed.
Comment 7 Scott Harris 2002-11-06 10:58:23 CST
What is the expected result from $TAO_ROOT/tests/Bug_1270_Regression?
I ran it for 5 minutes and it just repeated like:
      [harris_s@paris Bug_1270_Regression]$ perl run_test.pl 
      (14143|1024) Echo::echo_payload, sleeping
      (14142|1024) Echo::echo_payload, sleeping
      (14144|1024) Echo::echo_payload, sleeping
      (14143|1024) Echo::echo_payload, aborting
      (14142|1024) Echo::echo_payload, aborting
      (14144|1024) Echo::echo_payload, aborting
      (14159|1024) Echo::echo_payload, sleeping
      (14158|1024) Echo::echo_payload, sleeping
      (14157|1024) Echo::echo_payload, sleeping
      ...

I am using gcc 2.96 on RH 7.2.
Comment 8 Nanbor Wang 2002-11-06 11:01:45 CST
The server sholudnt die even when the client dies..
Comment 9 Scott Harris 2002-11-06 11:06:00 CST
Should it run for the full 70 minutes in the perl script?