1233 – Apparent race conditions in ACE_TP_Reactor::handle_socket_events

Bug 1233 - Apparent race conditions in ACE_TP_Reactor::handle_socket_events

Summary: Apparent race conditions in ACE_TP_Reactor::handle_socket_events

Status:	RESOLVED DUPLICATE of bug 1020

Alias:	None

Product:	ACE
Classification:	Unclassified
Component:	ACE Core (show other bugs)
Version:	5.2.3
Hardware:	All All

Importance:	P3 major
Assignee:	DOC Center Support List (internal)

URL:

Depends on:
Blocks:	1202
	Show dependency tree

Reported:	2002-06-19 13:58 CDT by Carlos O'Ryan
Modified:	2002-06-20 13:12 CDT (History)
CC List:	1 user (show)

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Carlos O'Ryan 2002-06-19 13:58:54 CDT

Code examination of ACE_TP_Reactor::handle_socket_events reveals several subtle
and somewhat hard to reproduce race conditions, I'm not positive they do exist,
but I will need very solid arguments to believe otherwise.

1) The code releases the guard and then uses an event handler.  Yes, the handler
is suspended to avoid multiple threads from calling handle_XXX(), but there is
no guarantee that the application will not call remove_handler(..., DONT_CALL)
and then remove the handler.  Nor can the application find out if the handle is
suspended and therefore in the middle of an upcall.
    In other words, the TP_Reactor is at risk of using released memory, small
risk, but a real one.

2) After returning from dispatch_socket_event() the code calls:
    this->handler_rep_.find()
   the code there has no locks, and the data structure is shared.  Though we are
lucky in that there is little chance of memory faults corruption (the structure
is fixed size), there is still risk of reading invalid or inconsistent data.

I think race 2 can be fixed by re-acquiring the guard after coming back from the
dispatch_socket_event() call, however, the first race condition cannot be fixed
without more cooperation between the Reactor and the application.

I would suggest using the fixes attached to bug 1020, but if that was not
possible then we need to precisely define how are applications supposed to avoid
this problem.

Comment 1 Nanbor Wang 2002-06-19 14:04:45 CDT

#2 is fixed in my workspace and I should be checking it in another hour or so.

Comment 2 Nanbor Wang 2002-06-19 14:34:11 CDT

Much has been said and written about #1. Bug 1031 gives a sample case for #1. 
Whether another thread calls remove_handler () or the same thread as a part of 
the upcall calls remove_handler () is immaterial. The result is a crash and 
source of the problem is the same. IMHO, I would prefer moving #1 as another 
sample use case that #1031 would fix. Having many bug reports is only going to 
make things hard and skewed.

BTW, there *no* fixes attached to bug 1020.

Comment 3 Carlos O'Ryan 2002-06-19 14:52:28 CDT

Sorry, I meant the attachments to bug 1031.
If you think this bug is a duplicate of 1020, just mark it so.

Comment 4 Nanbor Wang 2002-06-19 15:51:07 CDT

Point #2 is fixed and the ChangeLog for that change is 

Wed Jun 19 14:25:52 2002  Balachandran Natarajan  <bala@cs.wustl.edu>

and POint #1 is  documented in bug 1031

Comment 5 Nanbor Wang 2002-06-20 13:12:27 CDT

Point #1 of this bug is a dupliacte of bug 1020. Point #2 is already fixed

*** This bug has been marked as a duplicate of 1020 ***