Bug 1365

Summary:	Test LB in presence of crashed/re-started members.
Product:	TAO	Reporter:	Ossama Othman <ossama.othman>
Component:	Load Balancer	Assignee:	DOC Center Support List (internal) <tao-support>
Status:	NEW ---
Severity:	normal
Priority:	P1
Version:	1.2.5
Hardware:	All
OS:	All
Bug Depends on:
Bug Blocks:	1277

Description Ossama Othman 2002-11-17 18:31:08 CST

The new LB ("Cygnus") must be tested in the presence of crashed/re-started 
object group members.  It must react to exceptional situations and misbehaved 
object group members in a predictable and reasonable manner.

Comment 1 Ossama Othman 2002-11-17 18:31:46 CST

Must be done for the TAO 1.3 release.

Comment 2 Ossama Othman 2005-04-23 13:11:37 CDT

Additional discussions on this issue ...

On Mon, 2005-04-11 at 22:12 +0200, Jesper Söderlund wrote:
> The load balancing algorithm implemented by TAO, how does it cope with
> the disapearance of a replica?

It currently does not.  The LoadManager currently assumes that all group
members are reachable.  However, it should be relatively straightforward
to implement a simple heartbeat/ping mechanism that is configurable at
run-time through the PortableGroup PropertyManager interface.

> Would it not, until it itself found out that the replica was gone,
> redirect clients to a "dead" node? How can the clients cope with this
> gracefully?

Currently, the LoadManager could potentially redirect the client to a
dead node.  If the client cannot connect to the group member (e.g. a
CORBA::TRANSIENT condition), I believe that the client side ORB should
redirect the request back to the previous location, which in this case
is the LoadManager.  Once redirected back to the LoadManager, the
LoadManager will reassign the client request to another group member.
The same goes for other non-fatal exceptions, such as CORBA::OBJ_ADAPTER
and CORBA::NO_RESPONSE.

Fatal exceptions must be handled by the client as they normally would
without load balancing.  Fatal exception handling related to dead nodes
will be mitigated once the dead node detection support is added to the
LoadManager.  However, the client should still be able to handle fatal
exceptions with or without load balancing.

We'll try to add support for dead member detection ASAP since the client
could potentially be bounced back and forth from and to the LoadManager,
respectively, indefinitely if there is only one member in the load
balanced object group, and that member is dead.