Bug 583

Summary: TAO's buffering queue can grow without bounds when the server is really slow
Product: TAO Reporter: Irfan Pyarali <irfan>
Component: ORBAssignee: DOC Center Support List (internal) <tao-support>
Status: ASSIGNED ---    
Severity: normal    
Priority: P3    
Version: 1.1   
Hardware: All   
OS: All   
Bug Depends on: 132    
Bug Blocks:    

Description Irfan Pyarali 2000-06-02 22:34:26 CDT
The eager request buffering scheme in TAO works as follows: the
request is first queued, then if the buffering constraints have been
reached, the queued messages are sent to the server.  The delayed
buffering scheme will try to send the request immediately to the
server if there are no requests in the queue.

TAO supports timed sends, i.e., the send operation will time out if
the server is slow in reading and responding to the requests and the
network buffers are full.  In this case, the requests are queued for
later delivery.  

The problem with the above is that the queuing is unbounded and
therefore may result in lots of requests being queued which could
consume lots of memory.  TAO needs to provide a way so that the user
can bound the queue.  When this bound is reached, further requests
should not be queued and the ORB should raise a TRANSIENT exception.

This will require two change: (a) add someway for the user to specify
this bound on the request queue - one way could be through an ORB
option and (b) raise a TRANSIENT exception when the enqueue fails
because the bound has been reached.  It would be nice if (a) can be
specified in number of messages and/or number of bytes consumed by the
queued messages.
Comment 1 Irfan Pyarali 2000-06-02 22:37:31 CDT
There is no hurry on this one.  I just wanted to document it in
bugzilla.  If your application has too many queued messages because of
a slow server, there is something wrong with your system design.
Comment 2 Carlos O'Ryan 2001-02-08 00:31:32 CST
The fixes for 132 should help out with this, or at least make it easier to 
handle it.
Comment 3 Carlos O'Ryan 2001-04-24 19:11:56 CDT
This is an idea to minimize this problem:

- The ORB could record the timeout value for each message on the queue.  Adding
this feature to the TAO_Queued_Message is trivial.

- Either periodically or based on the next message that expires the ORB could
iterate over all the messages and remove those that have expired.  Computing the
next timeout is not hard because TAO_Transport::check_buffering_constraints()
already iterate over the complete list.

- For oneways the messages can simply be dropped.  Twoways could be a little
more tricky, the problem is that we need to know why the message stopped,
probably the easiest is to store a 'message has timed out' flag in
TAO_Synch_Queued_Message, but the problem is to wake up the thread trying to
send this stuff out.

- For AMI requests we will need to rescue my TAO_Message_Callback idea,
basically each TAO_Asynch_Queued_Message may need to signal somebody (the AMI
Reply Handler or DII Request) about the fact that the request was timed out and
removed from the queue.


	In any case, a framework like that is not that complicated now, but beyond what
I can do in the current time frame.
Comment 4 Carlos O'Ryan 2001-07-31 18:11:12 CDT
In the previous comment I stated that some state was required to timeout twoway
requests.   With the incoming changes to fix 886 all Queued_Messages are
LF_Events, that class can be put in the TAO_LF_Event::LFS_TIMEOUT.  Changes in a
LF_Event state participate in the L/F event loop, as well as in the block on
read and block on reactor waiting strategies.

In other words, this feature looks much easier to implement now.