Please report new issues athttps://github.com/DOCGroup
SYNOPSIS: Instances of TAO_SSLIOP_Endpoint contain a reference to a TAO_SSLIOP_Credentials object. This credentials object is not protected by a lock. According to the comment in SSLIOP_Endpoint.h, this is so that the transport cache can be scanned without additional locking. However, the SSLIOP connector replaces the credentials object in an endpoint in TAO_SSLIOP_Connector::ssliop_connect(). If the endpoint is in the transport cache and is accessed by another thread while the credentials object is being replaced, the application can dereference an invalid credentials pointer, and crash. DESCRIPTION: We have been running tests on our multithreaded distributed application, which is implemented using CORBA over SSLIOP. In one test, a very occasional crash is seen, which I believe is due to a race condition in the SSLIOP implementation. The thread that crashes is always scanning the transport cache. The relevant parts of the top of the thread's call stack are: tao_ssliopd.dll!TAO_SSLIOP_Endpoint::hash() taod.dll!TAO_Base_Transport_Property::hash() taod.dll!TAO_Cache_ExtId::hash() taod.dll!ACE_Hash<TAO_Cache_ExtId>::operator()() ... taod.dll!TAO_Transport_Cache_Manager::bind_i() taod.dll!TAO_Transport_Cache_Manager::cache_idle_transport() Analysis of the crashed application shows that the crash occurs when the TAO_SSLIOP_Endpoint::hash() attempts to dereference what it believes is a pointer to a TAO_SSLIOP_Credentials object. This pointer is invalid, leading to the failure. Further analysis of the machine code generated for the hash() function shows that, as one of the first steps in the function, the credentials_ member variable of the TAO_SSLIOP_Endpoint object (i.e. the pointer to the credentials object) is loaded into a register. At this point, the pointer is valid. However, about 10 instructions later the pointer value in the register is dereferenced. At this point, the pointer is invalid, and does not match the value in the credentials_ member any more. The value in credentials_ has changed, not the value in the register. The conclusion I drew from the above was that another thread was replacing the credentials object in the endpoint while the crashing thread was calling hash (). The replacement of the credentials object led to the previous credentials object being deleted, even though the thread in the hash() method was using it. Analysis of the other threads in the application shows that one other thread is attempting to make a call to a remote CORBA object via SSLIOP. The top of this thread's call stack is tao_ssliopd.dll!TAO_SSLIOP_Connector::ssliop_connect() tao_ssliopd.dll!TAO_SSLIOP_Connector::connect() taod.dll!TAO_GIOP_Invocation::perform_call() taod.dll!TAO_Default_Endpoint_Selector::endpoint_from_profile() taod.dll!TAO_Default_Endpoint_Selector::select_endpoint() taod.dll!TAO_GIOP_Invocation::start() taod.dll!TAO_GIOP_Twoway_Invocation::start() The endpoint passed to ssliop_connect() is the endpoint of the local ORB, so it is present in the transport cache. Looking at the ssliop_connect() code, after setting up a service handler, the connector makes the following calls to set up this local endpoint: ssl_endpoint->qop (qop); ssl_endpoint->trust (trust); ssl_endpoint->credentials (credentials.in ()); It appears that the last of these lines is the problem. If this line is executed while another thread is in TAO_SSLIOP_Endpoint::hash(), the updating of the credentials object causes the thread in hash() to fail, as it is not expecting the credentials object to be replaced, and has no way of protecting against it. As an aside, this problem was previously masked by bug #1508, which resulted in the credentials object being nil in almost all circumstances. This was all tested with TAO 1.3.0, but looking at CVS the problem is still present in the latest version. SAMPLE FIX/WORKAROUND: Having got this far, I was intending to produce a patch to fix this problem. However, there is a problem: If the requirement is that the transport cache can be scanned without holding additional locks, there is no completely safe way to replace the credentials member of the SSLIOP endpoint. I can see a couple of possible approaches: 1) Don't call the credentials object in TAO_SSLIOP_Endpoint::hash(). This is the temporary solution I've adopted, and it does prevent the crash in our test harness. However, calling other TAO_SSLIOP_Endpoint methods that access the credentials object could equally well cause this crash, so it is hardly an ideal solution. 2) Hold a reference to the old credentials object in ssliop_connect(), and only delete the old credentials object at the end of the function. This should ensure that any function using the old credentials object will exit before the credentials object is deleted. However, it does not guarantee that this will happen: it just relies on functions that access the credentials object only doing so for a short time. It could also fail as an approach on any architecture that cannot read or write a pointer value as an atomic operation. 3) Don't update the credentials object in the endpoint in ssliop_connect() if one is already present in the endpoint. Since the endpoint in question is that of the local ORB, it isn't clear to me whether it would be valid for the credentials on this endpoint to ever change, once created. 4) Protect access to the credentials_ member of TAO_SSLIOP_Endpoint with a lock. This should solve the problem, but will mean locks being acquired while scanning the transport cache. That could be addressed by either not including the credentials hash in the endpoint hash, or by precomputing the credentials hash and storing it as a member in the endpoint, so that the lock would not need to be held while calculating the hash. If someone could provide guidance on the approach to take, I will happily look into generating a patch to solve all this.
I'll look into the problem.
The caching of endpoint hash values implemented for 1.4.0 means that this problem doesn't show up any more. It is still theoretically possible that a race condition could occur though, if a method accessed the endpoint's credentials object while the SSLIOP Connector was replacing it.