Bug 1231 - Possible dangling reference to a event handler in the TP_Reactor
Summary: Possible dangling reference to a event handler in the TP_Reactor
Status: RESOLVED FIXED
Alias: None
Product: ACE
Classification: Unclassified
Component: ACE Core (show other bugs)
Version: 5.2.3
Hardware: All All
: P3 normal
Assignee: Nanbor Wang
URL:
Depends on:
Blocks: 1202
  Show dependency tree
 
Reported: 2002-06-18 13:40 CDT by Nanbor Wang
Modified: 2002-06-19 14:24 CDT (History)
1 user (show)

See Also:


Attachments
A proposed fix for bug 1231 (2.80 KB, patch)
2002-06-19 13:38 CDT, Carlos O'Ryan
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Nanbor Wang 2002-06-18 13:40:03 CDT
The following piece of code segment in ACE_TP_Reactor::handle_socket_events ()
could throw up invalid pointer pointer usage
......
      // in addition to the event handler before we make a check for
      // the resume_handler ().
      if (dispatch_info.event_handler_ != 0 &&
          this->handler_rep_.find (dispatch_info.handle_) != 0)
        {
         flag =
           dispatch_info.event_handler_->resume_handler ();
        }

If the dispatching returned 0 and if the connection was deleted (quite possible
the way we have set things up in TAO) the acces to
event_handler_->resume_handler () could crash
Comment 1 Nanbor Wang 2002-06-18 13:40:21 CDT
Accepting the bug
Comment 2 Carlos O'Ryan 2002-06-18 15:39:59 CDT
A suggested fix for this bug, change the code in the Reactor to say:

if (dispatch_info.event_handler_ != 0
    && this->handler_rep_.find(dispatch_info.handle_)
       == dispatch_info.event_handler_)
{
  // ... as before ...
}

however this approach exposes a race condition, namely, there are no guarantees
about the lifetime of dispatch_info.event_handler_.  Or rather, the only
guarantee is that the pointer must remain valid while it is still registered
with the Reactor.  However, the solution proposed checks if the pointer is there
and then use the pointer without holding a lock.  Thus the change is incomplete,
a lock to protect the critical section is required.  I suspect that the same
lock is required today, but since so many other things are broken the race is
probably never happening (i.e. other things break first)
Comment 3 Nanbor Wang 2002-06-18 16:18:29 CDT
Adding Carlos to the CC list
Comment 4 Nanbor Wang 2002-06-18 16:22:41 CDT
Okay, coming back to the issue of lock, holding a lock (in the case of the 
TP_Reactor it is the token) is expensive since it involves a *context switch*. 
Adding yet another context switch may be bad. BUT, this context switch 
certainly is not in the path of invocation but only after making an invocation 
and hence should be okay. 
Comment 5 Carlos O'Ryan 2002-06-18 16:44:20 CDT
Errr, context switch issues aside (the Reactor uses an incredibly expensive
synchronization mechanism), I will repeat a mantra:
Make it Run, Make it Right, Make it Fast, Make it Simple.

We are on "Make it Right".  Or put in another words, performance is good, not
crashing is better.  I will never accept a performance argument against
correctness, certainly against a particular solution, but not against correctness.

If those are not good enough reasons:  the Token is already used, that find()
operation should have a synchronization in it somewhere, Oh please say it does,
if it does not then some other thread could corrupt the data structure and you
get all sorts of interesting crashes!  So all you need to do is make sure the
mutex is acquired only once and things will be fine performance-wise too (at
least not any worse than today.)

OK, enough rambling, here is a way to do this:  save the result from:

dispatch_info.event_handler_->resume_handler()

before releasing the lock... this assumes that the flag is a fixed value, i.e.
event handlers are always application-resumed or always Reactor-resumed, the
current API is crappy because it does not enforce the rule (nor does it say if
this is a rule or not), no need to reacquire a mutex.

You still need to check the event handler though (i.e. call find() and check
that it is not nil) before calling this->resume_handler(), otherwise you could
resume some other instantiation of the same ACE_HANDLE, with a different
event_handler registered for it. [That last statement requires a longer
explanation, probably its own bug report, per-our conversation this morning]
Comment 6 Carlos O'Ryan 2002-06-19 13:38:02 CDT
Created attachment 125 [details]
A proposed fix for bug 1231
Comment 7 Carlos O'Ryan 2002-06-19 13:40:09 CDT
I've attached a proposed fix for bug 1231, it seems to work in that my test no
longer crashes, nor does valgrind report memory problems.
Comment 8 Nanbor Wang 2002-06-19 14:24:27 CDT
The bug is fixed and here is the ChangeLog entry 
Wed Jun 19 14:25:52 2002  Balachandran Natarajan  <bala@cs.wustl.edu>