Please report new issues athttps://github.com/DOCGroup
Review of the code suggests, thanks to Irfan, that we have horrible race conditions while handling location forwards. Here are the potential places - When two threads are communicating to a forwarded place and if one thread receives a location forward to a new place the state of the other thread holding the old pointer seems to be in limbo. - Worse still if the thread making the invocation decides to close the forward location pointers with the other thread still referencing it.
Assigning it to me
Accepting this one
Assigning it to Irfan
I am outlining the design of a proposed solution that Bala and I discussed today. This design provides fixes for bugs 1237, 1238, 1239, and 1273. Criticism is welcome. - Design assumptions This solutions assumes that the original set of profile is fixed, i.e., we do not consider LOCATION_FORWARD_PERM messages. LOCATION_FORWARD_PERM messages have been deprecated in CORBA anyway. This is important since no locks will be held while traversing the original set of profiles. This assumption therefore has performance implications - applications not using location forwarding will not have to pay for the overhead associated with managing and traversing forwarded profiles. - Normal profile traversal The traversal algorithm without any forwarded profiles will be as follows: A thread making an invocation starts by checking if the Object has been forwarded. This section assumes that the Object has not been forwarded (forwarded Objects are described below). The Object will have a profile hint pointing to the profile that was last used successfully. A thread making a request will use this profile. If this profile works, well and good. If not, the thread will advance to the next profile. The advance is only local to the thread, unlike what happens now as described in bug 1238. The new profile is now used for the invocation. If it succeeds, the last successful profile hint in the Object is updated. Note that this update is done without any locks held and assumes that pointer/integer assignments are atomic. If the invocation fails, the next profile is used until all the profiles in the Object have been used. This may include cycling through all the profiles, unlike what happens now as described in bug 1239. - Updating Object after a location forward When a location forward occurs, the invoking thread grabs a write lock on the forwarded part of the Object. It sets a flag on the Object indicating that it has been forwarded. It adds the new forwarded profiles to the Object. It sets the forwarded profile hint pointing to the first profile in the forwarded profiles. After this the write lock is released. - Forwarded profile traversal A thread making an invocation starts by checking if the Object has been forwarded. This is done without any locks held. If the flag is not set, traversal normally. If it is set, first grab a read lock on the forwarded part of the Object. Check the forward flag again to make sure that the object is still forwarded. Then do normal traversal on the forwarded part. Also make sure to traverse all the profiles in the forwarded part, unlike what happens now as described in bug 1273. - Removing forwarded IOR If all the profiles in the forwarded part do not work, grab a write lock, reset the forwarded flag and reset the forwarded profiles.
I was happily reading your proposal until I got here: "Note that this update is done without any locks held and assumes that pointer/integer assignments are atomic." Sorry, I missed the smileys, because otherwise it is not funny. If you need a lock then grab a lock: fast is good, RIGHT is better. The rest reads mostly OK, but I want to mention some alternatives, I'll try to get to it this weekend.
We thought about it. But we were not sure where the "rightness" is being compromised (unless or otherwise we come up with something totally bizzare) with or without a lock. So the decision to avoid a lock and hence the assertion to catch everyone's attention. We would love to hear other alternatives too.
Let me elaborate what I meant: If you have variable x shared by two threads, one thread reading that variable and the other writing to it, it is ok in this use case if the reader thread either gets the old value or the new one since the shared variable is just a hint. Even in the case where the reader thread gets a half baked value, we can check the validity of hint index before proceeding. I think this should be ok.
Aha... OK, as long as you can check the validity of your 'hints' (and I really hope you can do that without risking safety) then you should be OK. But you know, better than I do, that's not called "atomic", you almost scared me to death this morning.
The procedure outlined above would start running to problems if the forwarded location starts making nested upcalls. The way to get around this is to leave the read-lock *after* a succesfull send () of the request message (to the forwarded location) and before going to wait for the reply. Holding the lock while going to wait for the reply would result in nasty surprises!
A test case for reproducing this problem. Test donated by Veritas folks.. It will be added as the next entry Test case will show problems if the run_test.pl script is run as perl ./run_test.pl --nouseiorfile --iterations 5 and no problems if run as perl ./run_test.pl --useiorfile --iterations 10
Created attachment 175 [details] tarred gzipped file
In our daily builds the Muxed_GIOP_Versions test fails due to this problem.
I will handle this
accept
to pool