Bug 1947 - Deadlock in naming service.
Summary: Deadlock in naming service.
Status: NEW
Alias: None
Product: TAO
Classification: Unclassified
Component: Name Service (show other bugs)
Version: 1.4.1
Hardware: x86 Windows 2000
: P3 blocker
Assignee: DOC Center Support List (internal)
URL:
Depends on:
Blocks:
 
Reported: 2004-09-27 10:08 CDT by Alain Dupont
Modified: 2008-03-31 04:41 CDT (History)
0 users

See Also:


Attachments
Patch to resolve this bug. (7.09 KB, patch)
2004-09-27 10:11 CDT, Alain Dupont
Details
There was a problem with previous patch in resolve method. Here the latest patch. (7.06 KB, patch)
2004-10-08 15:50 CDT, Alain Dupont
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Alain Dupont 2004-09-27 10:08:55 CDT
Hi,

We have a system where each machine as its own naming service. For example, if 
we have 2 machines in our system, each machine will 
have a NamingContext (Machine-A) pointing to the root of Machine A's Naming 
service and a Naming Context (Machine-B) 
pointing to the root of machine B's Naming service. Using the NamingViewer the 
naming services should look like this:

For Machine A:
  Root
      Machine-A
      Machine-B
  TestServant


For Machine B:
  Root
      Machine-A
      Machine-B
  TestServant

The deadlock occurred when both machines try to resolve the TestServant at the 
same time:
On machine B:
  Machine-B-Root->resolvestr("Machine-A\TestServant");

On machine A:
  Machine-A-Root->resolvestr("Machine-B\TestServant");

Where Machine-A-Root and Machine-B-Root are the root of there NamingContext 
respectively.

The deadlock occurred in the Root of NamingContext of each machine. Since the 
Root context of Machine-B is locked by 
machine B to resolve Machine-A context, the machine A cannot resolve the 
Machine-B context since it is already locked by Machine B 
and Machine B cannot resolve the TestServant object because the Root context of 
machine A has been locked by the machine A itself. 

The deadlock occurred in the implementation of the Naming Service 
(TAO_Hash_Naming_Context and TAO_Storable_Naming_Context classes) in the 
resolve method (also all methods with CosNaming::Name parameter). 
CORBA::Object_ptr
TAO_Hash_Naming_Context::resolve (const CosNaming::Name& n
                                  ACE_ENV_ARG_DECL)
{
  ACE_GUARD_THROW_EX (TAO_SYNCH_RECURSIVE_MUTEX, ace_mon, this->lock_,
                      CORBA::INTERNAL ());
{
...

          // If there are any exceptions, they will propagate up.
          ACE_TRY
            {
              CORBA::Object_ptr resolved_ref;
->>>>> Here, It should release the Mutex acquired befaore calling the context 
of another machine.
              resolved_ref = context->resolve (rest_of_name
                                               ACE_ENV_ARG_PARAMETER);
              ACE_TRY_CHECK;
              return resolved_ref;
            }
          ACE_CATCH (CORBA::TIMEOUT, timeoutEx)
            {
              ACE_PRINT_EXCEPTION (timeoutEx, "Hash_Naming_Context::resolve (), 
Caught CORBA::TIMEOUT exception");
              // throw a CannotProceed exception back to the client
              //
              ACE_TRY_THROW (CosNaming::NamingContext::CannotProceed
                             (context.in (), rest_of_name));
            }
          ACE_ENDTRY;
        }
 ...
}
Comment 1 Alain Dupont 2004-09-27 10:11:06 CDT
Created attachment 283 [details]
Patch to resolve this bug.
Comment 2 Ossama Othman 2004-09-28 13:18:58 CDT
I'll handle this bug report.
Comment 3 Ossama Othman 2004-09-28 13:19:10 CDT
Mine.
Comment 4 Alain Dupont 2004-10-08 15:50:17 CDT
Created attachment 287 [details]
There was a problem with previous patch in resolve method. Here the latest patch.
Comment 5 Johnny Willemsen 2008-03-31 04:41:07 CDT
Comment on attachment 283 [details]
Patch to resolve this bug.

incorrect