Bug 1237 - Race conditions and dangling pointers while handling LOCATION_FORWARD's
Summary: Race conditions and dangling pointers while handling LOCATION_FORWARD's
Status: NEW
Alias: None
Product: TAO
Classification: Unclassified
Component: ORB (show other bugs)
Version: 1.2.3
Hardware: All All
: P3 normal
Assignee: DOC Center Support List (internal)
URL:
Depends on:
Blocks: 319 1273
  Show dependency tree
 
Reported: 2002-06-26 16:26 CDT by Nanbor Wang
Modified: 2006-08-16 09:06 CDT (History)
0 users

See Also:


Attachments
tarred gzipped file (8.48 KB, application/octet-stream)
2002-12-29 17:16 CST, Nanbor Wang
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Nanbor Wang 2002-06-26 16:26:01 CDT
Review of the code suggests, thanks to Irfan, that we have horrible race
conditions while handling location forwards. Here are the potential places 

- When two threads are communicating to a forwarded place and if one thread    
receives a location forward to a new place the state of the other thread holding
the old pointer seems to be in limbo. 

- Worse still if the thread making the invocation decides to close the forward
location pointers with the other thread still referencing it.
Comment 1 Nanbor Wang 2002-07-04 12:12:23 CDT
Assigning it to me
Comment 2 Nanbor Wang 2002-07-04 12:17:10 CDT
Accepting this one
Comment 3 Nanbor Wang 2002-08-08 22:16:13 CDT
Assigning it to Irfan
Comment 4 Irfan Pyarali 2002-08-09 10:49:58 CDT
I am outlining the design of a proposed solution that Bala and I
discussed today.  This design provides fixes for bugs 1237, 1238,
1239, and 1273. Criticism is welcome.

- Design assumptions

This solutions assumes that the original set of profile is fixed,
i.e., we do not consider LOCATION_FORWARD_PERM messages.
LOCATION_FORWARD_PERM messages have been deprecated in CORBA anyway.
This is important since no locks will be held while traversing the
original set of profiles.  This assumption therefore has performance
implications - applications not using location forwarding will not
have to pay for the overhead associated with managing and traversing
forwarded profiles.

- Normal profile traversal

The traversal algorithm without any forwarded profiles will be as
follows: A thread making an invocation starts by checking if the
Object has been forwarded.  This section assumes that the Object has
not been forwarded (forwarded Objects are described below).  The
Object will have a profile hint pointing to the profile that was last
used successfully.  A thread making a request will use this profile.
If this profile works, well and good.  If not, the thread will advance
to the next profile.  The advance is only local to the thread, unlike
what happens now as described in bug 1238.  The new profile is now
used for the invocation.  If it succeeds, the last successful profile
hint in the Object is updated.  Note that this update is done without
any locks held and assumes that pointer/integer assignments are
atomic.  If the invocation fails, the next profile is used until all
the profiles in the Object have been used.  This may include cycling
through all the profiles, unlike what happens now as described in bug
1239.

- Updating Object after a location forward

When a location forward occurs, the invoking thread grabs a write lock
on the forwarded part of the Object.  It sets a flag on the Object
indicating that it has been forwarded.  It adds the new forwarded
profiles to the Object.  It sets the forwarded profile hint pointing
to the first profile in the forwarded profiles.  After this the write
lock is released.

- Forwarded profile traversal

A thread making an invocation starts by checking if the Object has
been forwarded.  This is done without any locks held.  If the flag is
not set, traversal normally.  If it is set, first grab a read lock on
the forwarded part of the Object.  Check the forward flag again to
make sure that the object is still forwarded.  Then do normal
traversal on the forwarded part. Also make sure to traverse all the
profiles in the forwarded part, unlike what happens now as described
in bug 1273.

- Removing forwarded IOR

If all the profiles in the forwarded part do not work, grab a write
lock, reset the forwarded flag and reset the forwarded profiles.
Comment 5 Carlos O'Ryan 2002-08-09 12:09:42 CDT
I was happily reading your proposal until I got here:

"Note that this update is done without any locks held and assumes that
pointer/integer assignments are atomic."

Sorry, I missed the smileys, because otherwise it is not funny.

If you need a lock then grab a lock: fast is good, RIGHT is better.

The rest reads mostly OK, but I want to mention some alternatives, I'll try to
get to it this weekend.
Comment 6 Nanbor Wang 2002-08-09 12:25:46 CDT
We thought about it. But we were not sure where the "rightness" is being 
compromised (unless or otherwise we come up with something totally bizzare) 
with or without a lock. So the decision to avoid a lock and hence the assertion 
to catch everyone's attention.

We would love to hear other alternatives too.
Comment 7 Irfan Pyarali 2002-08-09 15:40:55 CDT
Let me elaborate what I meant: If you have variable x shared by two
threads, one thread reading that variable and the other writing to it,
it is ok in this use case if the reader thread either gets the old
value or the new one since the shared variable is just a hint.  Even
in the case where the reader thread gets a half baked value, we can
check the validity of hint index before proceeding.  I think this
should be ok.
Comment 8 Carlos O'Ryan 2002-08-09 17:14:32 CDT
Aha... OK, as long as you can check the validity of your 'hints' (and I really
hope you can do that without risking safety) then you should be OK.  But you
know, better than I do, that's not called "atomic", you almost scared me to
death this morning.
Comment 9 Nanbor Wang 2002-08-09 17:59:31 CDT
The procedure outlined above would start running to problems if the forwarded
location starts making nested upcalls. 

The way to get around this is to leave the read-lock *after* a succesfull send
() of the request message (to the forwarded location) and before going to wait
for the reply. Holding the lock while going to wait for the reply would result
in nasty surprises!
Comment 10 Nanbor Wang 2002-12-29 17:14:48 CST
A test case for reproducing this problem. Test donated by Veritas folks.. It 
will be added as the next entry

Test case will show problems if the run_test.pl script is run as 

perl ./run_test.pl --nouseiorfile --iterations 5

and no problems if run as 

perl ./run_test.pl --useiorfile --iterations 10
Comment 11 Nanbor Wang 2002-12-29 17:16:16 CST
Created attachment 175 [details]
tarred gzipped file
Comment 12 Nanbor Wang 2003-01-21 07:56:23 CST
In our daily builds the Muxed_GIOP_Versions test fails due to this problem. 
Comment 13 Johnny Willemsen 2004-08-01 12:12:09 CDT
I will handle this
Comment 14 Johnny Willemsen 2004-08-01 12:13:52 CDT
accept
Comment 15 Johnny Willemsen 2006-08-16 09:06:18 CDT
to pool