Summary: | ORB crashes if peer dies while ORB is blocked trying to send requests | ||
---|---|---|---|
Product: | TAO | Reporter: | Carlos O'Ryan <coryan> |
Component: | ORB | Assignee: | DOC Center Support List (internal) <tao-support> |
Status: | RESOLVED FIXED | ||
Severity: | critical | ||
Priority: | P3 | ||
Version: | 1.2.3 | ||
Hardware: | All | ||
OS: | All | ||
Bug Depends on: | 1305, 1309 | ||
Bug Blocks: | 1202, 1277 | ||
Attachments: |
Regression test for this bug (tarred)
Patches to the ORB core. Patches to the protocols. |
Description
Carlos O'Ryan
2002-08-05 15:14:28 CDT
Created attachment 134 [details]
Regression test for this bug (tarred)
Doh! The repo is frozen, so I attached the regression test to the bug, will commit once the beta is out and they thaw the repo. Last heard that Carlos was looking to fix this. If not, we need to take care of this. Adding dependency on 1305 Created attachment 150 [details]
Patches to the ORB core.
Created attachment 151 [details]
Patches to the protocols.
OK. I attached two patches that fix this bug. The first patch: http://deuce.doc.wustl.edu/bugzilla/showattachment.cgi?attach_id=150 modifies the ORB core and solves the problem (at least as far as I can solve it.) The second patch: http://deuce.doc.wustl.edu/bugzilla/showattachment.cgi?attach_id=151 simply modifies the pluggable protocols to match the changes in the ORB Core, so it needs no explanation. As to the first patch, here are the changes in detail: 1) It eliminates the pending_upcall_ vs. refcount_ fields in the Conneciton_Handler. Having two reference counts is hard to debug and extremely hard to get right. It also makes it hard to state when the object is deleted, hard to analyze the reference counting rules and it actually does not help with anything I can see, so it is zapped. 2) The transport_ field in the Connection_Handler is atomically modified. 3) Closing connections is also atomic. 4) When a connection is closed *all* the activations in the Reactor are removed. The last one is the really important change, but it does not help without (3). I also documented the reference counting with REFCNT comments in the places where it is incremented or decremented, that way we can analyze reference counting statically, and convince ourselves that it is done right. Please review the changes and let me know what do you think. Be adviced, I do not have much time to break the changes in smaller portions, so if there is something you do not like you better change it yourselves. Not mine anymore. I submitted the patches and everything. Returning to the tao-support tarpit. Fixed! Details are available in Mon Oct 21 22:45:02 2002 Balachandran Natarajan <bala@isis- server.isis.vanderbilt.edu> I ran the tests for these for almost the past two days in various ways. The only problem that I have seen the test crash is because of stack overflow. With some aggressive testing over the past two days, we can give some assurance that this is fixed. |