Please report new issues athttps://github.com/DOCGroup
The following piece of code segment in ACE_TP_Reactor::handle_socket_events () could throw up invalid pointer pointer usage ...... // in addition to the event handler before we make a check for // the resume_handler (). if (dispatch_info.event_handler_ != 0 && this->handler_rep_.find (dispatch_info.handle_) != 0) { flag = dispatch_info.event_handler_->resume_handler (); } If the dispatching returned 0 and if the connection was deleted (quite possible the way we have set things up in TAO) the acces to event_handler_->resume_handler () could crash
Accepting the bug
A suggested fix for this bug, change the code in the Reactor to say: if (dispatch_info.event_handler_ != 0 && this->handler_rep_.find(dispatch_info.handle_) == dispatch_info.event_handler_) { // ... as before ... } however this approach exposes a race condition, namely, there are no guarantees about the lifetime of dispatch_info.event_handler_. Or rather, the only guarantee is that the pointer must remain valid while it is still registered with the Reactor. However, the solution proposed checks if the pointer is there and then use the pointer without holding a lock. Thus the change is incomplete, a lock to protect the critical section is required. I suspect that the same lock is required today, but since so many other things are broken the race is probably never happening (i.e. other things break first)
Adding Carlos to the CC list
Okay, coming back to the issue of lock, holding a lock (in the case of the TP_Reactor it is the token) is expensive since it involves a *context switch*. Adding yet another context switch may be bad. BUT, this context switch certainly is not in the path of invocation but only after making an invocation and hence should be okay.
Errr, context switch issues aside (the Reactor uses an incredibly expensive synchronization mechanism), I will repeat a mantra: Make it Run, Make it Right, Make it Fast, Make it Simple. We are on "Make it Right". Or put in another words, performance is good, not crashing is better. I will never accept a performance argument against correctness, certainly against a particular solution, but not against correctness. If those are not good enough reasons: the Token is already used, that find() operation should have a synchronization in it somewhere, Oh please say it does, if it does not then some other thread could corrupt the data structure and you get all sorts of interesting crashes! So all you need to do is make sure the mutex is acquired only once and things will be fine performance-wise too (at least not any worse than today.) OK, enough rambling, here is a way to do this: save the result from: dispatch_info.event_handler_->resume_handler() before releasing the lock... this assumes that the flag is a fixed value, i.e. event handlers are always application-resumed or always Reactor-resumed, the current API is crappy because it does not enforce the rule (nor does it say if this is a rule or not), no need to reacquire a mutex. You still need to check the event handler though (i.e. call find() and check that it is not nil) before calling this->resume_handler(), otherwise you could resume some other instantiation of the same ACE_HANDLE, with a different event_handler registered for it. [That last statement requires a longer explanation, probably its own bug report, per-our conversation this morning]
Created attachment 125 [details] A proposed fix for bug 1231
I've attached a proposed fix for bug 1231, it seems to work in that my test no longer crashes, nor does valgrind report memory problems.
The bug is fixed and here is the ChangeLog entry Wed Jun 19 14:25:52 2002 Balachandran Natarajan <bala@cs.wustl.edu>