3387 – Notification Service fails to send to reconnected consumer (edge case)

Bug 3387 - Notification Service fails to send to reconnected consumer (edge case)

Summary: Notification Service fails to send to reconnected consumer (edge case)

Status:	NEW

Alias:	None

Product:	TAO
Classification:	Unclassified
Component:	Notification Service (show other bugs)
Version:	1.6.5
Hardware:	All All

Importance:	P3 normal
Assignee:	Rich Seibel

URL:

Depends on:
Blocks:

Reported:	2008-08-05 13:07 CDT by Rich Seibel
Modified:	2008-08-05 13:07 CDT (History)
CC List:	0 users

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Rich Seibel 2008-08-05 13:07:34 CDT

This is a bug reported by an OCI customer.

This problem was found using VxWorks, but is not limited to VxWorks.  It
can occur on any RTE platform or can occur on standard platforms when 
using the -ORBEndPoint option.

The problem is in the Notification Service.  It is an edge case that can
be reproduced in the following scenario.  Start a normal notification
service on a host (not VxWorks).  Start a consumer on VxWorks and a
producer on a host (not the same as the consumer, but can be the same as
the Notification Service).  This producer sends a continuous stream of
messages to the consumer, doesn't have to be real fast, just faster than
a VxWorks reboot.  Two possible paths follow.  The VxWorks machine is
rebooted, that is, the Notification Proxy is sending data to the
consumer when it goes away.  In this case, the Notification Service
transport will detect that the connection to the consumer, as upcall for
sending the data, has failed and the transport cache and Proxy will be
cleaned up.  The other scenario is where the consumer first disconnects
from the Notification Service and then reboots.  In this case, the
Notification Service transport does not detect the failure and the
upcall connection remains in the transport cache.  After the new
consumer connects back to the Notification Service a new Proxy is
created, when this new proxy tries to send its first message to the
consumer it finds the old connection left in the transport cache and
tries to use.  This old connection is no good and fails, which causes
the Proxy to be deleted and no messages flow to the new consumer.

This problem can be fixed by validating any possible existing
connections to the consumer during Proxy initialization.  If the
connection is bad, it will be found and cleaned up at this time.
Then, when the new Proxy tries to send real data it will create 
a new good connection and the new consumer will receive data.

The Notification Service is started with the command line option
-UseSeparateDispatchingORB 1 and a svc.conf file containing
static TAO_CosNotify_Service "-DispatchingThreads 1".

Since this problem also can occur on non-VxWorks platforms when the
-ORBEndPoint option is used on the Consumer, it should be possible
to create a regression test.  I will attempt to create such a test
and add it after the current beta is complete.