Please report new issues athttps://github.com/DOCGroup
The new load balancer has race conditions in all methods that implement those found in the PortableGroup::{GenericFactory, ObjectGroupManager, PropertyManager} interfaces. TAO's PortableGroup library has different locks for each of these interfaces. However, locking at level is too fine grained. Higher level code, such as that found in the new LB, ends up having race conditions since the state of the underlying PortableGroup code may change during call to the higher level LB code. Remove the locks from the PortableGroup library, and add locks to the LB code that calls the PortableGroup implementation. This makes synchronization coarser grained, but addresses the race conditions. Performance isn't an issue here since the methods in question are not in the critical path of the load balancer.
Blocker for the TAO 1.3 release.
Jai should be kept apprised of this bug, too.
Mine.
It turns out there aren't as many race conditions as I thought. The primary race conditions exist in the load balancing strategies that callback on the LoadManager. Strictly speaking, the so-called race conditions are not race conditions since all operations are performed atomically. The real problem is that the built-in load balancing strategies retrieve information from the LoadManager through the standard public methods. Unfortunately, the retrieved information, such as group membership, may already be obsoleted by related calls made by another thread. So, we need to make decision. Should the built-in load balancing strategies have the ability to lock out other threads when calling back on the LoadManager, or should a "protocol" be defined that specifies how the strategies should be behave if the retrieved information is obsolete?
lowering severity