Summary: | Corba client start to leak when TCP connection failed. | ||
---|---|---|---|
Product: | TAO | Reporter: | Steven Xie <steven.xie> |
Component: | ORB | Assignee: | DOC Center Support List (internal) <tao-support> |
Status: | NEW --- | ||
Severity: | normal | CC: | elliott_c, milan.cvetkovic, mitza, steven.xie |
Priority: | P3 | ||
Version: | 1.6.8 | ||
Hardware: | x86 | ||
OS: | Solaris | ||
Attachments: |
Patch for connection handler leakage on Solaris platform.
Proposed patch to $TAO_ROOT/tao/Connection_Handler.cpp |
Description
Steven Xie
2009-03-16 14:21:51 CDT
Adam, could this be related to all changes of last summer? Steven, can you retest with 1.6.5, 1.6.6 and 1.6.7, then we know better where this got introduced? Steven, please recompile libTAO with -DTAO_LOG_CH_REF_COUNTS and run with -ORBDebugLevel 10. This will give you full logging of all connection handler reference count activity. ignore corba exception ACE (19535|1) SCG:<ctor=80475e0> - config=80685f0 repo=8068668 superceded by repo=8068668 ACE (19535|1) DSB::instance, repo=8068668, name=TAO_ORB_Core_Static_Resources type=8067290 => 806c3a8 TAO (19535|1) - Transport_Cache_Manager_T::fill_set_i: current_size = 0, cache_maximum = 2048 TAO (19535|1) - Transport_Cache_Manager_T::purge: Cache size after purging is [0] TAO (19535|1) - IIOP_Connector::begin_connection, to <fsa:9999> which should block TAO (19535|1) - TAO_LF_CH_Event[0]::state_changed_i state 2 prev 0 cons TAO_IIOP_Connection_Handler 11111111111111111111111111 TAO (19535|1) - IIOP_Connection_Handler[137028904] ctor, this = 137012240 add 0x82aa410 2 TAO (19535|1) - IIOP_Connection_Handler[137028904]::add_reference, up to 2 TAO (19535|1) - TAO_LF_CH_Event[137028904]::state_changed_i state 6 prev 2 0x82aa410 rraaaa 0x82aa410 1 TAO (19535|1) - IIOP_Connection_Handler[137028904]::remove_reference, down to 1 TAO (19535|1) - IIOP_Connector::make_connection, connection to <fsa:9999> failed (errno: Connection refused) TAO (19535|1) - Transport_Connector::connect, make_connection failed ACE (19535|1) SCG:<dtor=80475e0> - new repo=8068668 ignore corba exception Here is what I found, int TAO_Connection_Handler::close_handler (u_long flags) { ..... // We only need to remove the reference from the transport if there // were connections pending at the time that the handler is closed or // the handler is being closed during a new connection. if (pending || ACE_BIT_DISABLED(flags, CLOSE_DURING_NEW_CONNECTION)) transport->remove_reference (); From ACE_Connector, the falg is set as CLOSE_DURING_NEW_CONNECTION. In this case, remove_reference won't be called, which causes the leak. Based on the comments, Should it be if (pending || ACE_BIT_ENABLED(flags, CLOSE_DURING_NEW_CONNECTION)) transport->remove_reference (); instead? It works for me now. But I don't know why it's DISABLED checked? and what side effects will the change be? Added Chad, Chad, you added the lines Steven Xie points at, can you have a look? Created attachment 1124 [details]
Patch for connection handler leakage on Solaris platform.
I tested this patch on Solaris x86/Linux platform. Both work fine. (In reply to comment #7) > I tested this patch on Solaris x86/Linux platform. Both work fine. > Does this mean you have run the entire TAO test suite on those platforms with this patch? There's still some problem. I only catch this issue on Solaris, it doesn't show up on linux. This bug is related to this changes https://svn.dre.vanderbilt.edu/viewvc/Middleware/trunk/TAO/tao/Connection_Handler.cpp?r1=80603&r2=81742 This can be reproduced on linux machines as well: cat >block_connect.conf <<EOF dynamic Client_Strategy_Factory Service_Object * libTAO.so.1.6.8:_make_TAO_Default_Client_Strategy_Factory () "-ORBConnectStrategy blocked" EOF ./client -ORBSvcConf block_connect.conf -ORBDebugLevel 10 \ -k corbaloc:iiop:myhost:9999/whatever after making sure there is no corba server running on myhost:9999 The issue is exposed when different system errors we got from OS(by default linux won't see this issue, but solaris will see it right away). "pending" is always false based on the testings we did. Neither ACE_BIT_DISABLED(flags, CLOSE_DURING_NEW_CONNECTION) or ACE_BIT_ENABLED(flags, CLOSE_DURING_NEW_CONNECTION) covers all the cases. Is it ok to remove this condition check? "if (pending || ACE_BIT_DISABLED(flags, CLOSE_DURING_NEW_CONNECTION))" (In reply to comment #12) > The issue is exposed when different system errors we got from OS(by default > linux won't see this issue, but solaris will see it right away). > "pending" is always false based on the testings we did. > Neither ACE_BIT_DISABLED(flags, CLOSE_DURING_NEW_CONNECTION) or > ACE_BIT_ENABLED(flags, CLOSE_DURING_NEW_CONNECTION) covers all the cases. > Is it ok to remove this condition check? > "if (pending || ACE_BIT_DISABLED(flags, CLOSE_DURING_NEW_CONNECTION))" > for example, when connection failed on the same host, solaris will set errno=146, when connection failed on remote host, errno will be 11. The fix Steven mentioned in comment #6 (reversing the modifications in https://svn.dre.vanderbilt.edu/viewvc/Middleware/trunk/TAO/tao/Connection_Handler.cpp?r1=80603&r2=81742 seems to work properly. We have run all tests in TAO by executing $ACE_ROOT/bin/auto_run_tests.pl -o -t and found the exact same set of failed test cases with and without the patch. This modification was entered as part of fixes for Bug 2935 based on the ReleaseDoc: > Tue May 20 18:05:08 UTC 2008 Chad Elliott <elliott_c@ociweb.com> $TAO_ROOT/tests/Bug_2935_Regression does not fail. Created attachment 1155 [details] Proposed patch to $TAO_ROOT/tao/Connection_Handler.cpp The patch (by Steven.Xie at mpathix.com) used for full TAO test from Comment #14 what about using ACE_BIT_ENABLED instead of DISABLED? That patch, although consistent with the comments in code, is not good. It resulted with handle leakage in some scenarios. We never run the full TAO test, but we ran out of file handles in client pretty quickly when the server process was not running. (In reply to comment #16) > what about using ACE_BIT_ENABLED instead of DISABLED? > As comments 12, it's all depends on which errno we got from OS. ACE_BIT_ENABLED still has leaks under certain senario. The only option for us is remove this check fully. "if (pending || ACE_BIT_DISABLED(flags, CLOSE_DURING_NEW_CONNECTION))" Chad, any ideas as author of the patch? |