Summary: | issue with error recovery with rw strategy | ||
---|---|---|---|
Product: | TAO | Reporter: | Johnny Willemsen <jwillemsen> |
Component: | ORB | Assignee: | DOC Center Support List (internal) <tao-support> |
Status: | NEW --- | ||
Severity: | normal | ||
Priority: | P3 | ||
Version: | 1.6.9 | ||
Hardware: | All | ||
OS: | Solaris | ||
Bug Depends on: | |||
Bug Blocks: | 3682 |
Description
Johnny Willemsen
2009-05-29 05:01:10 CDT
When trying to send a very large message on Solaris we see this behaviour below. The client just does 1 call, the ORB seems to do a retry which loops forever. This is using the bug_3683_regression which just has been added to the repository, only echo_client_i is modified to pass a very long string. 74 20 61 62 72 65 61 73 74 20 61 62 72 69 64 67 t abreast abridg 65 20 61 62 72 69 64 67 6d 65 6e 74 20 61 62 72 e abridgment abr 6f 61 64 20 61 62 72 6f 67 61 74 65 20 61 62 72 oad abrogate abr 75 70 74 20 61 62 73 63 65 73 73 20 61 62 73 63 upt abscess absc 69 73 73 61 20 61 62 73 63 69 73 73 61 65 20 61 issa abscissae a 62 73 65 6e 63 65 20 61 62 73 65 6e 74 20 61 62 bsence absent ab 73 65 6e 74 65 65 20 61 62 73 65 6e 74 65 65 69 sentee absenteei 73 6d 20 61 62 73 65 6e 74 69 61 20 61 62 73 65 sm absentia abse 6e 74 6d 69 6e 64 65 64 20 61 62 73 69 6e 74 68 ntminded absinth 65 20 61 62 73 6f 6c 75 74 65 20 61 62 73 6f 6c e absolute absol 75 74 69 6f 6e 20 61 62 73 6f 6c 76 65 20 61 62 ution absolve ab TAO (4557|1) - Transport[9]::cleanup_queue, byte_count = 65536 TAO (4557|1) - Transport[9]::cleanup_queue, after transfer, bc = 0, all_sent = 0, ml = 761248 TAO (4557|1) - Transport[9]::drain_queue_helper, byte_count = 65536, head_is_empty = 0 TAO (4557|1) - Transport[9]::drain_queue_i, helper retval = 1 TAO (4557|1) - Transport[9]::send_synchronous_message_i, error while scheduling flush - Not enough space TAO (4557|1) - Connection_Handler[9]::close_connection_eh, purging entry from cache TAO (4557|1) - Transport_Cache_Manager_T::mark_connected, false Transport[9] TAO (4557|1) - Transport[9]::cleanup_queue_i, cleaning up complete queue TAO (4557|1) - Transport[9]::cleanup_queue_i, discarded 0 messages, 0 bytes. TAO (4557|1) - Transport[9]::cleanup_queue_i, cleaning up complete queue TAO (4557|1) - Transport[9]::cleanup_queue_i, discarded 0 messages, 0 bytes. TAO (4557|1) - TAO_LF_CH_Event[9]::state_changed_i, state LFS_SUCCESS->LFS_CONNECTION_CLOSED TAO (4557|1) - Connection_Handler[9]::close_connection_eh TAO (4557|1) - IIOP_Transport[9]::send_message, write failure - Not enough space TAO (4557|1) - Remote_Invocation::send_message, failure while sending message TAO (4557|1) - Connection_Handler[9]::close_connection_eh, purging entry from cache TAO (4557|1) - Transport[9]::cleanup_queue_i, cleaning up complete queue TAO (4557|1) - Transport[9]::cleanup_queue_i, discarded 0 messages, 0 bytes. TAO (4557|1) - Transport[9]::cleanup_queue_i, cleaning up complete queue TAO (4557|1) - Transport[9]::cleanup_queue_i, discarded 0 messages, 0 bytes. TAO (4557|1) - Connection_Handler[9]::close_connection_eh TAO (4557|1) - Transport[9]::make_idle TAO (4557|1) - Transport[9]::cleanup_queue_i, cleaning up complete queue TAO (4557|1) - Transport[9]::cleanup_queue_i, discarded 0 messages, 0 bytes. TAO (4557|1) - Invocation_Adapter::invoke_i, handling forwarded locations TAO (4557|1) - Request timeout is 15768000000 milliseconds TAO (4557|1) - Transport_Cache_Manager_T::fill_set_i, current_size = 0, cache_maximum = 128 TAO (4557|1) - Transport_Cache_Manager_T::purge, Cache size after purging is [0] TAO (4557|1) - IIOP_Connector::begin_connection, to <css5:35878> which should block TAO (4557|1) - TAO_LF_CH_Event[0]::state_changed_i, state LFS_IDLE->LFS_CONNECTION_WAIT TAO (4557|1) - IIOP_Connection_Handler[8134192] ctor, this=612630 TAO (4557|1) - IIOP_Connection_Handler::open, The local addr is <136.225.108.120:44248> TAO (4557|1) - IIOP_Connection_Handler::open, IIOP connection to peer <136.225.108.120:35878> on 9 TAO (4557|1) - Transport::post_open, tport id changed from 8134192 to 9 TAO (4557|1) - Transport[9]::post_open, cache_map_entry_ is 0 TAO (4557|1) - TAO_LF_CH_Event[9]::state_changed_i, state LFS_CONNECTION_WAIT->LFS_SUCCESS TAO (4557|1) - Cache_IntId_T::Cache_IntId_T, this=fffffd7fffdfe2f0 Transport[9] is connected TAO (4557|1) - Cache_IntId_T::recycle_state, ENTRY_UNKNOWN->ENTRY_IDLE_AND_PURGABLE Transport[9] IntId=fffffd7fffdfe2f0 TAO (4557|1) - Transport_Cache_Manager_T::bind_i, Transport[9] @ hash:index{-1998456674:0} TAO (4557|1) - Transport_Cache_Manager_T::bind_i: Success Transport[9] @ hash:index{-1998456674:0}. Cache size is [1] TAO (4557|1) - IIOP_Connector::make_connection, new connected connection to <css5:35878> on Transport[9] TAO (4557|1) - Transport_Connector::connect, opening Transport[9] in TAO_CLIENT_ROLE TAO (4557|1) - Transport_Cache_Manager_T::is_entry_available_i[9], true state is ENTRY_IDLE_AND_PURGABLE TAO (4557|1) - Cache_IntId_T::recycle_state, ENTRY_IDLE_AND_PURGABLE->ENTRY_BUSY Transport[9] IntId=466560 TAO (4557|1) - Transport_Cache_Manager_T::find_i, Found available Transport[9] @hash:index {-1998456674:0} TAO (4557|1) - Transport_Connector::connect, got an existing connected Transport[9] in role TAO_CLIENT_ROLE TAO (4557|1) - Codeset_Manager_i::set_tcs, setting char translator (00010001) TAO (4557|1) - Codeset_Manager_i::set_tcs, setting wchar translator (00010109) TAO (4557|1) - Exclusive_TMS::request_id - <1> TAO (4557|1) - Codeset_Manager_i::generate_service_context, using tcs_c <ISO8859_1> (00010001), tcs_w <UTF-16> (00010109) TAO (4557|1) - GIOP_Message_Base::dump_msg, send GIOP message v1.2, 826772 data bytes, my endian, Type Request[1] GIOP message - HEXDUMP 826784 bytes (showing first 912 bytes) 47 49 4f 50 01 02 01 00 94 9d 0c 00 01 00 00 00 GIOP............ 03 00 00 00 00 00 00 00 31 00 00 00 14 01 0f 00 ........1....... 4e 55 50 00 00 00 1a 00 00 00 00 01 00 00 00 52 NUP............R 6f 6f 74 50 4f 41 00 63 68 69 6c 64 5f 70 6f 61 ootPOA.child_poa 00 00 00 00 00 01 00 00 00 45 63 68 6f 00 00 00 .........Echo... 0c 00 00 00 65 63 68 6f 5f 73 74 72 69 6e 67 00 ....echo_string. the problem doesn't appear when we don't use rw Seems we need to flush the data explicitly, but the flushing strategy is LF based by defalt. By using "static Client_Strategy_Factory "-ORBClientConnectionHandler RW"" we don't use LF, but the flushing strategy was not changed. Will retest with a different flushing strategy. if the flushing strategy gives the problem, the orb at least shouldn't keep retrying I think there are two bugs, first the flushing strategy can use LF and LF can't be used for clientconnection handling Also, when the flush fails, we give a TAO_INVOKE_RESTART back to Invocation_Adapter::invoke_i. This assumes we got a location forward. The problem is that we only should do a location forward when for example Invocation_Adapter::invoke_twoway does return TAO_INVOKE_RESTART and we have a reply status indicating location forward. I think we need to pass this reply status back to Invocation_Adapter::invoke_i and only do a look while status == restart and we have a reply status location forward if (status == TAO_INVOKE_RESTART && (synch.reply_status () == GIOP::LOCATION_FORWARD || synch.reply_status () == GIOP::LOCATION_FORWARD_PERM)) Invocation_Adapter::invoke_i is definitively wrong, we should check whether we really got a location forward or not. This would mean breaking several internal APIs Tue Jun 2 06:47:43 UTC 2009 Johnny Willemsen <jwillemsen@remedy.nl> * docs/Options.html: When we use ClientConnectionHandler RW we don't use the LF loop. In case the buffers are full, we use the flushing strategy for which the default is leader follower, but that will not work because we don't use the lf loop in this case. Added a note the ClientConnectionHandler RW needs a FlusingStrategy of blocking * tests/Bug_3683_Regression/Echo_Client_i.cpp * tests/Bug_3683_Regression/Echo_Client_i.h * tests/Bug_3683_Regression/run_test.pl * tests/Bug_3683_Regression/Simple_util.cpp Use a commandline argument -p to indicate the size of the string in megabytes. * tests/Bug_3683_Regression/svc.conf Set the FlushingStrategy to blocking this is more a configuration issue, RW and blocking flushing should be used together. if RW and lf flushing is used, it exhibits an issue with the error recovery in the orb |