Bug 3282

Summary: Bidirectional IIOP behaves poorly with multiple interfaces
Product: TAO Reporter: Phil Mesnier <mesnierp>
Component: ORBAssignee: Phil Mesnier <mesnierp>
Status: NEW ---    
Severity: normal CC: colding
Priority: P3    
Version: 1.6.3   
Hardware: All   
OS: Linux   
Attachments: Anticipated fix for the primary bug

Description Phil Mesnier 2008-04-01 21:26:13 CDT
I noticed while testing on my local machine that if I start a server, 
say from the TAO/tests/BiDirectional test, with -ORBEndpoint 
iiop://localhost:12345 then start the client with a defaulted endpoint, 
the test fails.

The problem is that the Bidir filtering logic in 
TAO_IIOP_Transport::get_listen_point() has an optimization that seems to 
go all the way back to the original implementation of bidir. The code 
carries this note:
   // Note: Looks like there is no point in sending the list of
   // endpoints on interfaces on which this connection has not
   // been established. If this is wrong, please correct me.
   CORBA::String_var local_interface;

And the code that follows then validates the supplied acceptor's 
endpoints to see if any match the interface address on which the client 
connection were established. If not, the endpoint value is ignored.

This actually causes 2 problems. First, in the trivial case illustrated 
above, this causes the test to fail. However in situations where the 
client is on a multihomed host, and the local connection to a server 
happens to be on one of the secondary interfaces, this would cause the 
server to not find the established bidir connection entry in the cache 
and attempt to open a connection to the primary address.

Actually, this latter case also points out another problem. It seems 
that upon receiving a bidir IIOP service context, the server will 
iterate over the listen point list, dutifully recaching the transport 
for each new entry in the list. This seems to have the effect of 
discarding all but the last entry in the listen point list, leaving only 
the last entry in the list in the cache, almost certainly guaranteeing a 
redundent new connection will be created, or at least attempted. The 
solution to this would be to add multiple alias entries in the cache 
that all refer to the same transport. But the thought of implementing 
something like that really gives me the heebie jeebies!

When I looked up Bidirectional IIOP and listen point lists in the CORBA 
3.0.2 spec, I didn't see any language that would imply that it is OK to 
limit the listen point to the originating interface address. And 
philosophically, it seems wrong to do so, since the endpoint might be 
aliased in the acceptor, and thus never match the connected interface. 
It seems to me that the client would want to give the server as much 
information as possible to give it the best chance possible of finding 
the bidir connection.
Comment 1 Phil Mesnier 2008-10-30 21:17:07 CDT
Created attachment 1022 [details]
Anticipated fix for the primary bug

The attached patch resolves the issue where a client on a multihomed host aggressively filters out endpoints that do not match the interface through which a connection was made. It also resolves the second half of the problem by only caching the first entry in the listen point list. 

The reason for only caching the first endpoint is that the connection cache right now has a 1-1 association between endpoints and transports. When a server receives a bidir context, it recaches the current transport, associating it with the newly received endpoint. If a list of endpoints are received, then the transport would only be associated with the last endpoint in the list and that is surely the wrong outcome. 

Rather than rip out the processing loop completely, I've got it temporarily limiting the processing to no more than 1 element from the list. If the transport cache can be modified so that many endpoints can refer to the same transport, then iterating over the full list would be the right thing to do.
Comment 2 Phil Mesnier 2008-10-31 13:50:46 CDT
Turns out that the problem of bidir IIOP is not resolved by my patch. When a POA has an endpoint policy applied to it, it becomes possible that the first endpoint in the listen point list (as defined by the order of IIOP_Acceptors in the acceptor repository) may not be represented in an IOR generated by the POA. Thus when the recipient of a bidir context containing the listen point list only recaches the transport to the first endpoint in the list, the desired endpoint for contacting the target object won't be in the cache.

The solution to this problem is to figure out a way to allow more than one endpoint refer to a single transport in the transport cache. At least that is the best I can think of at this time.