Bug 2381

Summary: IFR database corruption leading to IFR_Service crash
Product: TAO Reporter: Richard Spence <richard.spence.extern>
Component: Interface RepositoryAssignee: DOC Center Support List (internal) <tao-support>
Status: ASSIGNED ---    
Severity: normal    
Priority: P3    
Version: 1.4.8   
Hardware: x86   
OS: Linux   

Description Richard Spence 2006-01-26 03:44:46 CST
Simple IDL loaded into IFR can cause (apparent) database corruption which then
causes IFR_Service to crash when corrupted area is subsequently queried.

Instructions to provoke this problem:

1. Begin by starting IFR:

	$ IFR_Service &

2. Load the following IDL (assumed to be in a file called T1.idl):

	typedef unsigned short US;

	interface IF
	{
	  oneway void OP(in US p1);
	};

	$ tao_ifr \
		-ORBInitRef InterfaceRepository=file://if_repo.ior \
		IDL/T1.idl

3. Run the (attached) simple query program 'ops':

	$ ./ops \
		-ORBInitRef InterfaceRepository=file://if_repo.ior \
		IDL:IF:1.0

This should print something like:

	## IDL:IF:1.0
	~~ IDL:IF/OP:1.0

This is telling me that interface 'IF' has one operation called 'OP'. 
Obviously, this is correct.

4. Now load the following IDL (assumed to be in a file called T2.idl):

	typedef unsigned short US;

Yes, that is a repeat of the first line from T1.idl.

	$ tao_ifr \
		-ORBInitRef InterfaceRepository=file://if_repo.ior \
		IDL/T2.idl

5. Run the 'ops' program again, same command as step 3. This time I get 
the following failure + crash:

	path_to_def_kind - bad path: 'defns\00000001'
	path_to_idltype - not an IDLType: 'defns\00000001'
	CORBA::SystemException: describe_interface failed
	[1]  + Segmentation fault            IFR_Service

--------

It looks like there are possibly two problems:

a) IFR_Service gets corrupted if presented with IDL that overwrites 
existing loaded stuff &

b) tao_ifr doesn't query for existing definitions before loading the IFR.

--------

And here is the source for the 'ops' program:

#include <iostream>
#include <unistd.h>
#include "tao/ORB.h"
#include "tao/IFR_Client/IFR_BasicC.h"

int
main
(int argc, char* argv[])
{
   CORBA::ORB_var orb;
   CORBA::Object_var orepos;

   try {
     orb = CORBA::ORB_init(argc, argv);
   }
   catch(const CORBA::SystemException&) {
     std::cerr << "CORBA::SystemException: ORB_init failed\n";
     return 1;
   }

   try {
     orepos = orb->resolve_initial_references("InterfaceRepository");
   }
   catch(const CORBA::SystemException&) {
     std::cerr << "CORBA::SystemException: resolve_initial_references 
failed\n";
     return 1;
   }

   if(CORBA::is_nil(orepos)) {
     std::cerr << "nil repository reference\n";
     return 1;
   }

   CORBA::Repository_var repos = CORBA::Repository::_narrow(orepos.in());

   if(CORBA::is_nil(repos)) {
     std::cerr << "failed to narrow repository reference\n";
     return 1;
   }

   for(int arg = 1; arg < argc; ++arg) {
     std::cout << "## " << argv[arg] << std::endl;

     CORBA::Contained_var defn;

     try {
       defn = repos->lookup_id(argv[arg]);
     }
     catch(const CORBA::SystemException&) {
       std::cerr << "CORBA::SystemException: lookup_id failed\n";
       return 1;
     }
	
     if(CORBA::is_nil(defn)) {
       std::cout << argv[arg] << ": not found" << std::endl;
     }
     else {
       CORBA::DefinitionKind kind;

       try {
	kind = defn->describe()->kind;
       }
       catch(const CORBA::SystemException&) {
	std::cerr << "CORBA::SystemException: describe failed\n";
	return 1;
       }

       if(kind == CORBA::dk_Interface) {
	CORBA::InterfaceDef_var intf = CORBA::InterfaceDef::_narrow(defn.in());

	if(CORBA::is_nil(intf)) {
	  std::cerr << "failed to narrow to interface\n";
	  return 1;
	}

	CORBA::InterfaceDef::FullInterfaceDescription* desc;

	try {
	  desc = intf->describe_interface();
	}
	catch(const CORBA::SystemException&) {
	  std::cerr << "CORBA::SystemException: describe_interface failed\n";
	  return 1;
	}

	for(size_t idx = 0; idx < desc->operations.length(); ++idx) {
	  std::cout << "~~ " << desc->operations[idx].id << std::endl;
	}
       }
       else {
       std::cout << argv[arg] << ": not an interface" << std::endl;
       }
     }
   }

   return 0;
}
Comment 1 Jeff Parsons 2006-01-26 14:47:21 CST
Fixed

Thu Jan 26 20:36:47 UTC 2006  Jeff Parsons <j.parsons@vanderbilt.edu>

        * orbsvcs/IFR_Service/be_produce.cpp(BE_cleanup):
        
          Removed code to destory the temporary holding scope entry in
          the repository after each IDL file is processed. Instead the
          lifetime of that entry is now tied to the repository itself.
          
        * orbsvcs/IFR_Service/ifr_adding_visitor.cpp (visit_typedef):
        
          Removed code that replaces a typedef with the same repo id
          with a new entry, which would invalidate any references to
          the typedef entry that other entries may hold. The IFR will
          now throw the BAD_PARAM minor code that corresponds to an
          attempt to create an entry for a repo id that already exists
          in the repository. Thanks to Richard Spence
          <richard dot spence dot extern at icn dot siemens dot de>
          for reporting the problem when the typdef is used as an
          operation parameter. This closes [BUGID:2381].
          
        * orbsvcs/orbsvcs/IFRService/IFR_Service_Utils.cpp (name_exists):
        
          Changed the loop to be a FOR loop using the explicit section
          names, rather than a while loop calling enumerate_sections()
          to get each section name. 

Comment 2 Richard Spence 2006-02-20 09:16:12 CST
Retested the given scenario with TAO 1.4.9. I now get a segfault during step 4.
GDB backtrace follows:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread -1214523712 (LWP 7998)]
0xb7bec5c7 in TAO::Invocation_Adapter::~Invocation_Adapter ()
    at /lvol1/ACE_wrappers/TAO/tao/Sequence_T.i:15
(gdb) info stack
#0  0xb7bec5c7 in TAO::Invocation_Adapter::~Invocation_Adapter ()
    at /lvol1/ACE_wrappers/TAO/tao/Sequence_T.i:15
#1  0xb7e154ba in CORBA::Container::create_alias (this=0x0, 
    id=0x4 <Address 0x4 out of bounds>, name=0x4 <Address 0x4 out of bounds>, 
    version=0x4 <Address 0x4 out of bounds>, original_type=0x4)
    at IFR_Client/IFR_BaseC.cpp:4209
#2  0xb7fe5021 in ifr_adding_visitor::visit_typedef (this=0xbffff580, 
    node=0x80886d8) at ifr_adding_visitor.cpp:2393
#3  0xb7f87979 in AST_Typedef::ast_accept (this=0x80886d8, visitor=0x4)
    at ast/ast_typedef.cpp:185
#4  0xb7fe32bf in ifr_adding_visitor::visit_scope (this=0xbffff580, 
    node=0x8088700) at ifr_adding_visitor.cpp:98
#5  0xb7fe5df3 in ifr_adding_visitor::visit_root (this=0xbffff580, 
    node=0x807b2b8) at ifr_adding_visitor.cpp:2442
#6  0xb7f822d1 in AST_Root::ast_accept (this=0x807b240, visitor=0x4)
    at ast/ast_root.cpp:215
#7  0xb7fe3079 in BE_produce () at be_produce.cpp:229
#8  0x0804e8dc in DRV_drive (s=0x807b0a8 "t2.idl")
    at /lvol1/ACE_wrappers/TAO/TAO_IDL/tao_idl.cpp:261
#9  0x0804ed29 in main (argc=2, argv=0x0)
    at /lvol1/ACE_wrappers/TAO/TAO_IDL/tao_idl.cpp:345
Comment 3 Jeff Parsons 2006-02-20 15:36:51 CST
I tried this on Windows and Linux workspaces, and each time I get the expected 
exception. From your stack trace, it looks like this line in 
ifr_adding_visitor::visit_typedef()

      if (be_global->ifr_scopes ().top (current_scope) == 0)

is succeeding but putting 0 into current_scope. That  may be
because this line in ifr_adding_visitor::visit_root()

  if (be_global->ifr_scopes ().push (be_global->repository ()) != 0)

is succeeding, but be_global->repository() is returning 0.
This is turn may be because code in be_produce.cpp (BE_ifr_repo_init)
is not working as expected.

Please check these steps out in your debugger, and see if
any of my guesses are on the right track - it would help
a lot in tracking down the source of the problem.
Comment 4 Richard Spence 2006-02-21 04:31:16 CST
After further debugging as requested I can declare the following:

Inside ifr_adding_visitor::visit_typedef variable current_scope is non-null:

(gdb) p current_scope
$7 = 0x808f4c8
(gdb) p *$
$8 = {<CORBA::IRObject> = {<CORBA::Object> = {_vptr.Object = 0xb7ef5988, 
      servant_ = 0x0, proxy_broker_ = 0xb7c88e38, is_collocated_ = false, 
      is_local_ = false, is_evaluated_ = true, 
      ior_ = {<TAO_Var_Base_T<IOP::IOR>> = {ptr_ = 0x0}, <No data fields>}, 
      orb_core_ = 0x806e870, protocol_proxy_ = 0x8089280, refcount_ = 1, 
      refcount_lock_ = 0x808f618}, _vptr.IRObject = 0xb7ef5900, 
    the_TAO_IRObject_Proxy_Broker_ = 0x0}, _vptr.Container = 0xb7ef5878, 
  static _tc_Description = 0xb7f116d4, 
  static _tc_DescriptionSeq = 0xb7f116c0, 
  the_TAO_Container_Proxy_Broker_ = 0x0}

To my non-expert eye this all looks OK.

Execution reaches create_alias() as follows:

CORBA::Container::create_alias (this=0x808f4c8, id=0x8088818 "IDL:US:1.0", 
    name=0x8088818 "IDL:US:1.0", version=0x8088818 "IDL:US:1.0", 
    original_type=0x8088818) at /lvol1/ACE_wrappers/TAO/tao/Object.i:81

I can trace execution as far as _tao_call.invoke(...) where everything falls
apart. I will try to debug further but in the meantime my programming sixth
sense is screaming that I should check build configuration and hygiene.

I built 1.4.9 (as always) from clean tarball with no deviation from the
'standard' traditional process with the only deviation being 'TAO_ORBSVCS =
IFRService'.

I will get a colleague to build ACE+TAO (5.4.9/1.4.9) and test this behaviour in
his environment as a sanity check.

Jeff, if you have a non-vanilla Linux build process, can you please post it so
that I can see if it 'fixes' this problem in my environment?
Comment 5 Jeff Parsons 2006-02-21 07:22:59 CST
Here are the specs on my Linux workspace:

Linux version 2.4.21-27.0.2.ELsmp (bhcompile@tweety.build.redhat.com) (gcc 
version 3.2.3 20030502 (Red Hat Linux 3.2.3-53)), GNU Make version 3.79.1.

Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --
infodir=/usr/share/info --enable-shared --enable-threads=posix --disable-
checking --with-system-zlib --enable-__cxa_atexit --host=i386-redhat-linux
Thread model: posix


and all the makefiles are generated from MPC as usual. Hope this helps.
Comment 6 Jeff Parsons 2006-02-21 07:36:25 CST
The debug messages seem to indicate that the problem is at a lower level in 
the client-side or server-side ORB, i.e., the stub (invoke method) or the POA 
(invocation_adapter). Have you tried any examples with remote calls to 
something other than the IFR?
Comment 7 Jeff Parsons 2007-01-08 12:02:45 CST
Any further progress on this issue? I'd like to tie up this loose end before 
we cut the next beta.