Bug 3557

Summary: Premature endpoint validation
Product: TAO Reporter: Phil Mesnier <mesnierp>
Component: ORBAssignee: DOC Center Support List (internal) <tao-support>
Status: REOPENED ---    
Severity: normal    
Priority: P3    
Version: 1.6.7   
Hardware: All   
OS: Linux   
Bug Depends on: 3700    
Bug Blocks:    
Attachments: potential fix for early validation

Description Phil Mesnier 2009-01-23 14:27:11 CST
Created attachment 1062 [details]
potential fix for early validation

I encountered a situation where a client wants to use bidir with a server through a firewall, but doesn't have a need or a desire to allow unsolicited connection attempts. At first this client simply used 127.0.0.1:<port> as its listen endpoint and this worked fine. However a later test involved a new client instance making a new connection to the server, using the same configuration, thus the same <port> value. In this case, the processing of the bidir listen point list caused this new connection to be recached using the same endpoint as the first. 

When the server tried to make an invocation on the new callback, the transport cache incorrectly returned the first transport and thus the invocation went to the wrong client. 

At this point, the thought occurred to me that the clients listen endpoint could be aliased with some made up string, such as a stringified uuid value. While this allows for recaching transports to unique endpoints, when the server tried to initiate a callback, the IIOP_Connector::set_validate_endpoint() method foiled me, as the endpoint is "validated" before looking in the cache to see if it matches an existing transport.

However, the endpoint which is set isn't actually used until a connection attempt is initiated, so I believe it is safe to move the call to set_validate_endpoint() to right before calling make_connection or make_parallel_connection. 

I have a patch which I've attached. This allows a client to use an unreachable alias in the callback IOR, but the connection attempt still fails if a server has an unreachable address in its IOR.
Comment 1 Phil Mesnier 2009-01-26 10:59:14 CST
I'm committing my patch and a variation to the bidirectional run_test script.
Comment 2 Phil Mesnier 2009-02-09 13:53:52 CST
No adverse affects were observed after applying the patch.
Comment 3 Johnny Willemsen 2009-06-12 14:12:12 CDT
reopen, this test does fail in around 50% of the runs, see the link below (takes a few minutes to load)

http://scoreboard.theaceorb.nl/test_stats/get_test_histgraph?period=last_year&test_id=2460&blowup=true

http://scoreboard.theaceorb.nl/test_stats/get_test_histgraph?period=last_month&test_id=2460
Comment 4 Phil Mesnier 2009-06-12 14:43:49 CDT
Hi Johnny, 
I'm not sure how the patch for this bug could be intermittent. It should either work 100% or not work. I guess I don't understand what your graphs are showing. Assuming is that BiDirectional/run_test3557.pl, I'm guessing that is an aggregate across multiple platforms, not all of which are run every day, and that the test always fails on some platforms and not others.

I think the problem is that a side-effect of the test is that ACE_INET_Addr::ACE_INET_Addr(u_short port_number,
                              const char host_name[],
                              int address_family)
always reports an error if the call to set() fails. On some platforms, notably HPUX, the reported error includes the word "ERROR". I did not want to remove this error report in general because on most platforms, something more expressive is reported.

So should we eliminate the error report in ACE_INET_Addr?
Comment 5 Johnny Willemsen 2009-06-14 13:30:01 CDT
Phil, if you go to http://scoreboard.theaceorb.nl, go to the test stats. If you then open tao tests, you find this test. If you then drill down you will find that this test always fails on windows. below the output of one of our systems, see the inet_addr error is not the problem

TAO/tests/BiDirectional/run_test3557.pl


ERROR: client timedout
ERROR: client returned -1
ERROR: server returned 3

auto_run_tests_finished: TAO/tests/BiDirectional/run_test3557.pl Time:21s Result:0
Comment 6 Phil Mesnier 2009-06-14 22:11:55 CDT
Interesting.

I just ran the test successfully on my laptop. Then I connected to the internet and tried again and it failed with a timeout. 

It seems as though the ACE_INET_Addr::set (u_short, const char[], int, int) uses ACE_OS::gethostbyname_r(), which on windows uses ::gethostbyname(). If on a network, presuably with a DNS server available, this call take some substantial time to fail.

It seems that occurs on every iteration because the ACE_INET_Addr::set() is called during the IIOP_Endpoint::hash(), since IIOP_Endpoint::hash() depends on the object_addr_ being set, so that it can use object_addr_::hash() as its own hash value. Since the ACE_INET_Addr::set() fails, the INET_Addr's hash() will return 0. This will cause the IIOP_Endpoint::hash() to presume it self unhashed the next time around.

The premise of this bug is that one should legitemately be able to use an unresolveable listen point within a bidirectional IIOP context. Consider a client host on a private domain behind some firewall. It wants to use bidir to supply a callback reference. The server receiving the callback should not be required to resolve the client's address in order to use the bidir connection.

I was able to get around the delay by changing the IIOP_Endpoint hash algorithm to use ACE::hash_pjw(host_) + port_. This runs the test much quicker. I'm reluctant to commit this change right now, since it is on the critical path.

Probably a better fix is to just alter the run_test3557.pl parameters, such as change the -i 100 on the server command line to something tiny, such as -i 3. 

I will commit a change to silence the error using:

ACE_INET_Addr addr;
addr.set(port,host);

rather than 

ACE_INET_Addr addr (port, host);


Comment 7 Johnny Willemsen 2016-04-08 06:09:18 CDT
Looks this failed recently on Win7x64_VC14_Release_IPv6_WChar