Please report new issues athttps://github.com/DOCGroup
This problem can show up in either a multi-threaded or a nested upcall situation. I'm working with it in a nested upcall situation. A [simplified] description of a typical scenario is: * Incoming request #1 triggers Outgoing Request #A * Outgoing Request A discovers that there is no connection to the endpoint, so it starts to open Connection Alpha. * It drops into the leader/follower + reactor to continue to service new events while waiting for Connection Alpha to complete. At this point it has not updated the transport cache * Incoming request #2 triggers Outgoing Request #B [A nested upcall] * Outgoing Request B discovers that there is no connection to the endpoint (Connection Alpha is invisible) so it opens Connection Beta and drops into the leader/follower + reactor. * Additional upcalls at this point would open additional connections but I won't complicate this scenario by describing the details. * Connection Alpha completes. The call back from the reactor adds Connection Alpha to the transport cache marked as busy because the code expects to use it immediately. * Additional upcalls may occur at this point. They will find Connection Alpha but it is busy, so they will open yet, another new connection. * If the stack unwinds back to OutgoingRequest #B it discovers Connection Beta is still not complete so it continues to wait. Notice that Connection Alpha is complete (but marked busy). * Connection Beta completes. The call back from the reactor adds Connection Beta to the cache. * Outgoing request B can now be delivered over Connection Beta. * Outgoing request A can now be delivered over Connection Alpha. After that the system operates "normally" I have a test for this and am working on a fix -- sponsored by ATD.
maybe this relates also to bug 2755
added bug 3000 as dependency because that is also related to cache management
Changes checked in for x.6 Wed Aug 1 15:54:01 UTC 2007 Dale Wilson <wilsond@ociweb.com>
reopened, code has been reverted from the repository because of test result drop. The regression test now fails again on head
Any update on a real fix of this issue (and related ones)?
Reassigning this bug to me.
Created attachment 967 [details] Dale's changes plus a few of my own. I plan on committing these changes very soon.
I will look at the diff in detail next week, but it seems some ACE_TEXT are missing from the ACE_DEBUG lines
(In reply to comment #8) > I will look at the diff in detail next week, but it seems some ACE_TEXT are > missing from the ACE_DEBUG lines > Why would that be necessary? There are two log methods on ACE_Log_Msg one that takes ACE_TCHAR* and one that takes ACE_ANTI_TCHAR* (when ACE_HAS_WCHAR is defined). As long as I'm not printing anything that has wide characters in it, ACE_TEXT is not necessary. If there is an actual reason why they need to be there that I don't know, I will be glad to go back and add them.
Shouldn't there be ACE_TEXT around the strings passed to ACE_DEBUG? For example: + + ACE_DEBUG ((LM_DEBUG, + "TAO (%P|%t) - Transport_Connector::connect, " + "got an existing %s Transport[%d] in role %s\n", + base_transport->is_connected () ? "connected" : + "unconnected", + base_transport->id (), + cr == TAO::TAO_SERVER_ROLE ? "TAO_SERVER_ROLE" : + cr == TAO::TAO_CLIENT_ROLE ? "TAO_CLIENT_ROLE" : + "TAO_UNSPECIFIED_ROLE" )); where there would be a ACE_TEXT around the messages. Also %s is for tchar strings, %C is for cstrings. I do know that this has been used incorrectly in the past. Would it be ok to hold this after x.6.5, we want to fix the service gestalt issue reporter on the mailing list asap and release a new micro release then
(In reply to comment #10) > Shouldn't there be ACE_TEXT around the strings passed to ACE_DEBUG? For > example: > > + > + ACE_DEBUG ((LM_DEBUG, > + "TAO (%P|%t) - Transport_Connector::connect, " > + "got an existing %s Transport[%d] in role %s\n", > + base_transport->is_connected () ? "connected" : > + "unconnected", > + base_transport->id (), > + cr == TAO::TAO_SERVER_ROLE ? "TAO_SERVER_ROLE" : > + cr == TAO::TAO_CLIENT_ROLE ? "TAO_CLIENT_ROLE" : > + "TAO_UNSPECIFIED_ROLE" )); > > where there would be a ACE_TEXT around the messages. Also %s is for tchar > strings, %C is for cstrings. I do know that this has been used incorrectly in > the past. As long as wide characters are not involved at all, ACE_TEXT is not necessary. > > Would it be ok to hold this after x.6.5, we want to fix the service gestalt > issue reporter on the mailing list asap and release a new micro release then > I don't see a problem with that.
Is this now in svn?
This is now in the repository with the following ChangeLog entries. ACE: Tue May 20 17:55:39 UTC 2008 Chad Elliott <elliott_c@ociweb.com> Mon Jun 16 19:06:44 UTC 2008 Chad Elliott <elliott_c@ociweb.com> TAO Tue May 20 18:05:08 UTC 2008 Chad Elliott <elliott_c@ociweb.com> Thu May 22 11:43:44 UTC 2008 Chad Elliott <elliott_c@ociweb.com> Fri May 30 13:21:20 UTC 2008 Chad Elliott <elliott_c@ociweb.com> Tue Jun 3 15:15:10 UTC 2008 Chad Elliott <elliott_c@ociweb.com> Thu Jun 5 11:56:09 UTC 2008 Chad Elliott <elliott_c@ociweb.com> Thu Jun 5 18:02:10 UTC 2008 Chad Elliott <elliott_c@ociweb.com> Wed Jun 11 15:54:42 UTC 2008 Chad Elliott <elliott_c@ociweb.com> Wed Jun 11 19:07:03 UTC 2008 Chad Elliott <elliott_c@ociweb.com> Wed Jun 18 13:22:32 UTC 2008 Chad Elliott <elliott_c@ociweb.com> Wed Jun 18 14:39:38 UTC 2008 Chad Elliott <elliott_c@ociweb.com>
reopen, this seems to have a too high fail rate for the test stats at this moment: 2008_07_30 23 7.0 69.57 TAO TAO/tests/Bug_2935_Regression
(In reply to comment #14) > reopen, this seems to have a too high fail rate for the test stats at this > moment: > > 2008_07_30 23 7.0 69.57 TAO TAO/tests/Bug_2935_Regression > Latest fix has been committed. I'll check the scoreboard tomorrow to see if we can close this bug. Thu Aug 7 17:46:45 UTC 2008 Adam Mitz <mitza@ociweb.com> This commit fixes Bug 2935 (again) and resolves other transport cache management issues. The key issue of 2935 is that the transport is cached while the connection is in progress, so that other threads see that there's a connection in progress and don't start their own. That feature is essentially restoring one of the changes from Tue May 20 18:05:08 UTC 2008 Chad Elliott <elliott_c@ociweb.com> that were reverted in Fri May 30 13:21:20 UTC 2008 Chad Elliott <elliott_c@ociweb.com> (see $TAO_ROOT/ChangeLog for more details)
(In reply to comment #15) > Latest fix has been committed. I'll check the scoreboard tomorrow to see if we > can close this bug. There were test failures relating to non-blocking connects. These tests failed: TAO/tests/Bug_1476_Test/run_test.pl TAO/tests/Timed_Buffered_Oneways/run_test.pl TAO/tests/Blocking_Sync_None/run_test.pl TAO/tests/Oneway_Buffering/run_message_count.pl TAO/tests/Oneway_Buffering/run_buffer_size.pl TAO/tests/Oneway_Buffering/run_timeout.pl TAO/tests/Oneway_Buffering/run_timeout_reactive.pl TAO/tests/AMI_Buffering/run_message_count.pl TAO/tests/AMI_Buffering/run_buffer_size.pl TAO/tests/AMI_Buffering/run_timeout_reactive.pl Should be fixed by: Fri Aug 8 15:25:13 UTC 2008 Adam Mitz <mitza@ociweb.com> * tao/Transport_Cache_Manager.cpp: Fixed whitespace. * tao/Transport_Connector.cpp: Add the transport to the cache (ENTRY_CONNECTING) in the non-blocked case. This covers non-blocking invocations such as AMI and oneways with SYNC_NONE, etc.
Should have been closed when x.6.6 was released.