Please report new issues athttps://github.com/DOCGroup
The ORB can deadlock during a call to shutdown or destroy(), basically, a mutex is held while the ORB destroys the POA, the POA calls application code during its destructions (as servants are released), and that can call back to the ORB, thus the deadlock. The offending code is here: ------------------------------------ void TAO_ORB_Core::shutdown (CORBA::Boolean wait_for_completion ACE_ENV_ARG_DECL) { ACE_GUARD (TAO_SYNCH_MUTEX, monitor, this->lock_); if (this->has_shutdown () == 0) { ------------------------------------ It looks like the intent was to make the operation a no-op if the shutdown() flas was already set, that can easily be achieved by changing the code to: ------------------------------------------------------------------------ void TAO_ORB_Core::shutdown (CORBA::Boolean wait_for_completion ACE_ENV_ARG_DECL) { { ACE_GUARD (TAO_SYNCH_MUTEX, monitor, this->lock_); if(this->has_shutdown () != 0) return; this->has_shutdown_ = 1; } // Put the rest of the code here... } ------------------------------------------------------------------------ a full-blown state machine for the shutdown process might be useful too...
Sounds about right. Could you checkin this fix once the beta goes out. If that may not be possible please let me know and I can add this to my list Thanks
Urm, please don't check in the fix just yet. Other people, namely me, would like to discuss it. :) The only problem I see with that fix is that we'd be setting the ORB into a "shutdown" state before any of the actual shutdown code has run. Suppose an exception is thrown before shutdown() is completed? Is it reasonable to allow the application to retry the shutdown()? If so, subsequent calls to shutdown () will be no-ops. The current code isn't all that great either since it sets the has_shutdown_ flag in the middle of the shutdown() code. As an interim solution, can we add one or more ACE_Reverse_Locks to release the lock during any calls that potentially callback on the ORB instead?
I thought about having a reverse lock, but that would have race conditions. Imagine that you are leaving the lock to do adapter_registery_.close (), then it is possible that two threads could do the same thing which is "bad". There are ways to fix that, but the problem starts getting nasty! If exception are thrown from the shutdown () call, it is better to leave stuff instead of calling it back again and have a core dump! If shutdown () had 4 calls and during the first invocation, the fourth call threw an exception. Would you try invoking shutdown () again? :-). Would like to know your opinions on these!
I wasn't implying that we should just add reverse locks blindly. I'm well aware of the race condition possibilities. Let's assume that I've actually done some multithreaded programming, run into deadlocks and race conditions and corrected them. The idea is to determine where and when reverse locks can be employed, and if not why not. Perhaps using the ORB lock is too coarse grained. What about using finer grained locks? Also, Carlos' state machine idea sounds interesting, too. The argument about a seg fault occuring from calling shutdown() multiple times is bogus. Regardless of whether or not shutdown() is called multiple times exception or not, shutdown() should never seg fault. The ORB should be robust enough to avoid such a fatal condition.
Created attachment 133 [details] Full blown patch file for this bug.
Sorry, I added the attachment before seeing the discussion. Ossama raises some interesting points, here are some comments: 1) Raising exceptions: the main concern is what happens if any of the operations raises an exception during shutdown(). First that should not happen, we are basically calling cleanup methods here, raising exceptions during cleanup is evil (check Herb Sutter's "Exceptional C++" book for details.) Basically throwing exceptions in the middle of a cleanup operation is evil because there is no way to complete the cleanup or backup the changes, so you are stuck with half-destructed, inconsistent objects in your system, a worse situation than having those exceptions ignored. Now, if you *HAVE* to throw exceptions (for example because "now it is not a right time to cleanup"), then you should check all the exceptional conditions first and then start cleanup, that's what we do (I believe.) Unfortunately that is not enough, cleaning up the ORB involves calls to application code, where some poor misguided soul could decided to throw. IMHO we should enforce the "do not throw" rules by having throw specs (to document our intent) and catching exceptions (to protect ourselves against stupid compilers and users that "accidentally" remove the throw specs.) In short, "do not let exceptions escape from shutdown." 2) Finer grained locking: yes, this could solve my current problem. But some poor fool of a user will someday call shutdown() in the middle of shutdown(), no matter how fine grained your locks you will deadlock in that case. So I do not like it as a general solution. 3) Reverse locks: race conditions aside (I assume we can solve them), I see no advantage in this approach vs. my solution, and mine looks cleaner, but I'm obviously biased here. So my final vote is: use my patch but also enforce the exception rules during cleanup.
I agree that we shouldn't let exceptions escape from shutdown() once the shutdown process has actually started. If I understand you correctly, that means we'll have to catch all exceptions thrown by TAO_Adapter_Registry::close() and TAO_ORB_Core::destroy_interceptors(), or prevent them from throwing exceptions in the first place (enforced/documented with a throw()), right? I suppose users can always call POA::destroy() manually before ORB::shutdown() if they want see any POA exceptions that occur during POA destruction. Anyway, I'm okay with Carlos' new patch. Thanks Carlos!
Just to answer Ossama's questions: > I agree that we shouldn't let exceptions escape from shutdown() > once the shutdown process has actually started. If I understand > you correctly, that means we'll have to catch all exceptions > thrown by TAO_Adapter_Registry::close() Yes. > and TAO_ORB_Core::destroy_interceptors(), Yes. > or prevent them from throwing exceptions in the first place Err, I would say "*AND* prevent them....", paranoia has payed out nicely in this context. > (enforced/documented with a throw()), right? That too :-) BTW, the question is now: what do we do when one of the application callbacks do raise in a context where it is not allowed? I would like to see such problems logged, but using something that the application can hook to, so they (me :-) can log them or handle them as they see fit. This is a matter for a longer discussion, and I will bring it up in devo_group.
Accepting for tao-support. Can be fixed after the beta!
IMHO this is a blocker too, in any case it is very easy to fix since patches are already available.
Fixed! Please see Tue Aug 27 12:40:12 2002 Balachandran Natarajan <bala@isis-server.vuse.vanderbilt.edu> for details.