Bug 3545 - race condition in transport cache when purging
Summary: race condition in transport cache when purging
Status: RESOLVED FIXED
Alias: None
Product: TAO
Classification: Unclassified
Component: ORB (show other bugs)
Version: 1.6.7
Hardware: All Windows NT
: P3 normal
Assignee: Johnny Willemsen
URL:
Depends on:
Blocks: 3543
  Show dependency tree
 
Reported: 2009-01-17 04:57 CST by Johnny Willemsen
Modified: 2009-02-17 13:09 CST (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Johnny Willemsen 2009-01-17 04:57:41 CST
when the transport cache will purge connections, it searches for idle ones with the lock hold, then it sets the transport to busy.

after going through all the transports it releases the lock and then closes the transports that it wants to purge. after the lock is released it could happen that a transport is used for making a new connection before it gets purged. at that moment the close will close a transport which is used for making an invocation and then the caller gets a transient back.
Comment 1 Steve Totten 2009-01-18 15:26:54 CST
(In reply to comment #0)
> when the transport cache will purge connections, it searches for idle ones with
> the lock hold, then it sets the transport to busy.
> 
> after going through all the transports it releases the lock and then closes the
> transports that it wants to purge. after the lock is released it could happen
> that a transport is used for making a new connection before it gets purged.

I don't understand what you mean by "a transport is used for making a new
connection."  Do you mean used for making a new invocation?  If the transport
is already in the cache, then the connection already exists.  A new connection
would result in a new entry in the cache.

> at
> that moment the close will close a transport which is used for making an
> invocation and then the caller gets a transient back.

Perhaps the cache entries need a new state, such as PURGE_IN_PROGRESS.
Then, when the cache is being searched for a suitable transport for sending
an invocation, any transport in the PURGE_IN_PROGRESS state would be ignored.

Either that, or entries to be purged could be removed from the cache while the
lock is held and copied to a special purge list.  Then, the lock could be
released and the purging logic could safely close the connections on the
purge list.

Thoughts?

Steve
Comment 2 Johnny Willemsen 2009-01-19 00:18:35 CST
(In reply to comment #1)
> (In reply to comment #0)
> I don't understand what you mean by "a transport is used for making a new
> connection."  Do you mean used for making a new invocation?  If the transport
> is already in the cache, then the connection already exists.  A new connection
> would result in a new entry in the cache.

I meant a new invocation

> > at
> > that moment the close will close a transport which is used for making an
> > invocation and then the caller gets a transient back.
> 
> Perhaps the cache entries need a new state, such as PURGE_IN_PROGRESS.
> Then, when the cache is being searched for a suitable transport for sending
> an invocation, any transport in the PURGE_IN_PROGRESS state would be ignored.

Or when the transport is in that state, it is set to busy and used, but then not purged

> Either that, or entries to be purged could be removed from the cache while the
> lock is held and copied to a special purge list.  Then, the lock could be
> released and the purging logic could safely close the connections on the
> purge list.
> 
> Thoughts?

I can reproduce it now with but 3543 regresssion with 1000 clients and 20 threads. It seems this got introduced with the cache refactoring of last summer, with 1.6.1 it seems to run
Comment 3 Johnny Willemsen 2009-02-02 06:16:35 CST
mine
Comment 4 Johnny Willemsen 2009-02-17 13:09:04 CST
has been fixed