Bug 1416

Summary: core dump in __static_initialization_and_destruction_0
Product: TAO Reporter: Oliver M. Kellogg <oliver.kellogg>
Component: ORBAssignee: DOC Center Support List (internal) <tao-support>
Status: REOPENED ---    
Severity: normal    
Priority: P3    
Version: 1.2.8   
Hardware: All   
OS: All   

Description Oliver M. Kellogg 2003-01-02 18:09:39 CST
It cannot be guaranteed that the static initialization of the TAO_Changer 
object in tao/Strategies/advanced_resource.h is performed after the static 
initialization of the ORB_Core. However, the TAO_Changer relies on this. 
This may lead to a crash during static initialization of the TAO_Changer.

The problem is a basic one, and may happen with any kind of toolset. For a 
discussion, see

  http://www.parashift.com/c++-faq-lite/ctors.html#faq-10.11


Here is the original PRF posted to tao-bugs, and some excerpts from the
discussion that ensued. (This is pretty crude. Feel free to add/delete 
as you see fit.)


From: Oliver Kellogg (oliver.kellogg@sysde.eads.net)
Subject: [tao-bugs] TAO 1.2.7: core dump in 
__static_initialization_and_destruction_0 
Newsgroups: comp.soft-sys.ace
Date: 2002-12-02 08:06:37 PST 

TAO VERSION: 1.2.7
    ACE VERSION: 5.2.7

    HOST MACHINE and OPERATING SYSTEM:
        i686, RedHat 7.1 based 2.4.18-17.7.x kernel

    TARGET MACHINE and OPERATING SYSTEM, if different from HOST:
    COMPILER NAME AND VERSION (AND PATCHLEVEL):
        gcc version 2.96 20000731 (Red Hat Linux 7.2 2.96-112.7.1)

    AREA/CLASS/EXAMPLE AFFECTED:
        Own application

    DOES THE PROBLEM AFFECT:
        COMPILATION?
            no
        LINKING?
            no
        EXECUTION?
            yes
        OTHER (please specify)?

    SYNOPSIS:
        An application dumps core in g++'s
        __static_initialization_and_destruction_0 even before the main
        program starts.

    DESCRIPTION:
        Here is the gdb backtrace:
(gdb) bt
#0  0x40846fdc in ACE_String_Base<char>::set (this=0x41520d04,
s=0x811e170
"Advanced_Resource_Factory", len=25,
    release=1) at
/home/kellogg/ace-cvs/ACE_wrappers/ace/String_Base.cpp:33
#1  0x40846f81 in ACE_String_Base<char>::operator= (this=0x41520d04,
s=@0xbffff03c)
    at /home/kellogg/ace-cvs/ACE_wrappers/ace/String_Base.i:124
#2  0x41398e1c in TAO_ORB_Core::set_resource_factory
(resource_factory_name=0x415e6240 "Advanced_Resource_Factory")
    at ORB_Core.cpp:1145
#3  0x4159fb55 in
TAO_Resource_Factory_Changer::TAO_Resource_Factory_Changer
(this=0x41609574)
    at advanced_resource.cpp:40
#4  0x415a33a2 in __static_initialization_and_destruction_0
(__initialize_p=1,
__priority=65535)
    at /home/kellogg/ace-cvs/ACE_wrappers/ace/Free_List.cpp:137
#5  0x415a37b9 in global constructors keyed to
TAO_Resource_Factory_Changer::TAO_Resource_Factory_Changer ()
    at /home/kellogg/ace-cvs/ACE_wrappers/ace/Timer_Queue_T.cpp:189
#6  0x415a4605 in __do_global_ctors_aux () at LF_Strategy_Null.cpp:5
#7  0x415778ee in _init () at eval.c:41
#8  0x4000d0e1 in _dl_init (main_map=0x40015ef8, argc=1,
argv=0xbffff144,
env=0xbffff14c) at dl-init.c:70

#0  0x40846fdc in ACE_String_Base<char>::set (this=0x41520d04,
s=0x811e170
"Advanced_Resource_Factory", len=25,
    release=1) at
/home/kellogg/ace-cvs/ACE_wrappers/ace/String_Base.cpp:33
33            ACE_ALLOCATOR (temp,
(gdb) list
28        // Case 1. Going from memory to more memory
29        size_t new_buf_len = len + 1;
30        if (s != 0 && len != 0 && release && this->buf_len_ <
new_buf_len)
31          {
32            CHAR *temp;
33            ACE_ALLOCATOR (temp, 
34                           (CHAR *) this->allocator_->malloc
(new_buf_len *
sizeof (CHAR)));
35
36            if (this->release_)
37              this->allocator_->free (this->rep_);
Current language:  auto; currently c++
(gdb) p *this
$1 = {<ACE_String_Base_Const> = {static npos = -1}, allocator_ = 0x0,
len_ =
0, buf_len_ = 0, rep_ = 0x0, 
  release_ = 0, static NULL_String_ = 0 '\000'}

        As you can see, the allocator_ of the ACE_String_Base_Const has
        not yet been set.

        Here is the linkage:
$ ldd optimalPairing
        libXm.so.2 => /usr/X11R6/lib/libXm.so.2 (0x4001d000)
        libXpm.so.4 => /usr/X11R6/lib/libXpm.so.4 (0x401b9000)
        libXt.so.6 => /usr/X11R6/lib/libXt.so.6 (0x401c8000)
        libXext.so.6 => /usr/X11R6/lib/libXext.so.6 (0x40214000)
        libX11.so.6 => /usr/X11R6/lib/libX11.so.6 (0x40223000)
        libICE.so.6 => /usr/X11R6/lib/libICE.so.6 (0x40319000)
        libSM.so.6 => /usr/X11R6/lib/libSM.so.6 (0x40330000)
        libXmu.so.6 => /usr/X11R6/lib/libXmu.so.6 (0x40339000)
        libdonar.so.4 =>
/home/kellogg/donar_newace/DaProg/donar/cpp/libdonar.so.4 (0x4034f000)
        libpthread.so.0 => /lib/libpthread.so.0 (0x4035a000)
        libstdc++-libc6.2-2.so.3 => /usr/lib/libstdc++-libc6.2-2.so.3
(0x40372000)
        libm.so.6 => /lib/libm.so.6 (0x403b5000)
        libc.so.6 => /lib/libc.so.6 (0x403d7000)
        libdaci.so.4 =>
/home/kellogg/donar_newace/DaCache/daci/cpp/libdaci.so.4 (0x4050c000)
        libdalli.so.4 =>
/home/kellogg/donar_newace/DaTransport/dalli/cpp/libdalli.so.4
(0x4059e000)
        libACE.so.5.2.7 =>
/home/kellogg/ace-cvs/ACE_wrappers/ace/libACE.so.5.2.7 (0x406e0000)
        libXp.so.6 => /usr/X11R6/lib/libXp.so.6 (0x40914000)
        /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)
        libTAO_CosEvent.so.1.2.7 =>
/home/kellogg/ace-cvs/ACE_wrappers/ace/libTAO_CosEvent.so.1.2.7
(0x4091c000)
        libTAO_RTEvent.so.1.2.7 =>
/home/kellogg/ace-cvs/ACE_wrappers/ace/libTAO_RTEvent.so.1.2.7
(0x40b42000)
        libTAO_RTSched.so.1.2.7 =>
/home/kellogg/ace-cvs/ACE_wrappers/ace/libTAO_RTSched.so.1.2.7
(0x40d59000)
        libTAO_CosNaming.so.1.2.7 =>
/home/kellogg/ace-cvs/ACE_wrappers/ace/libTAO_CosNaming.so.1.2.7
(0x40e1f000)
        libTAO_Svc_Utils.so.1.2.7 =>
/home/kellogg/ace-cvs/ACE_wrappers/ace/libTAO_Svc_Utils.so.1.2.7
(0x40f1c000)
        libTAO_IORTable.so.1.2.7 =>
/home/kellogg/ace-cvs/ACE_wrappers/ace/libTAO_IORTable.so.1.2.7
(0x40f2c000)
        libTAO_Messaging.so.1.2.7 =>
/home/kellogg/ace-cvs/ACE_wrappers/ace/libTAO_Messaging.so.1.2.7
(0x40f43000)
        libTAO_PortableServer.so.1.2.7 =>
/home/kellogg/ace-cvs/ACE_wrappers/ace/libTAO_PortableServer.so.1.2.7
(0x40fa7000)
        libTAO.so.1.2.7 =>
/home/kellogg/ace-cvs/ACE_wrappers/ace/libTAO.so.1.2.7 (0x411d7000)
        libTAO_Strategies.so.1.2.7 =>
/home/kellogg/ace-cvs/ACE_wrappers/ace/libTAO_Strategies.so.1.2.7
(0x41523000)
        libdl.so.2 => /lib/libdl.so.2 (0x41607000)
        librt.so.1 => /lib/librt.so.1 (0x4160b000)

    REPEAT BY:
        The problem is 100% reproducible.  A lot of source code is
needed
        for this program; I will try to narrow it down.
        The same program runs fine when built with ACE-5.2/TAO-1.2.

    SAMPLE FIX/WORKAROUND:
        -
--------------------------------------------------------------------------------

From: Andreas Koehler (Andreas.Koehler@sysde.eads.net)
Subject: Re: [tao-bugs] Re: TAO 1.2.7: core dump in 
__static_initialization_and_destruction_0 
Date: 2002-12-13 04:32:38 PST 
 

Hi,

On Thu, 12 Dec 2002, Krishnakumar B wrote:
>On Thursday, 12 December 2002, Oliver Kellogg wrote:
>> This shared lib is built using TAO libraries. The TAO
>> libraries are not mentioned on linking the t_lib main.
>No, I think that it is an error in your Makefile. Your program links
>because it might not use any of the functions in TAO directly. Whenever you
>link, you need to give the complete list of libraries (till -lpthread
>-lrt). Check out any of the ACE Makefiles.

this sounds to me like a design weakness in ACE/TAO. May be the
dependencies between some statics are too complex for the linker. Have you
ever thought about substituting those statics with private
initialize-methods or whatever?
If not and/or such a change is not planned for the future, I suppose you
will get more problems with every new feature that comes with new
complexity as well.
On the other side not supporting shared linking encapsulation is a very
strong argument against the usage of TAO. How should we explain to the
users of our infrastructure library that they have to link with TAO
although they do not use it?
Last but not least you won't be able to work together with software
components that depend on libraries that can be used by dlopen (for
instance java written components that use JNI).


From: Douglas C. Schmidt (schmidt@cse.wustl.edu)
Subject: [tao-users] Re: [tao-bugs] Re: TAO 1.2.7: core dump in 
__static_initialization_and_destruction_0 
Newsgroups: comp.soft-sys.ace
Date: 2002-12-14 06:04:58 PST 
 
[...]
It's certainly possible to support shared linking encapsulation using
ACE+TAO.  In fact, there are a number of projects that do this
already, including the DMSO HLA/RTI and the TENA program.  The trick,
of course, is supporting shared linking encapsulation in a portable
way, which is something we haven't had time to do yet.  If you know
how to do this please let us know.


From: Oliver Kellogg (oliver.kellogg@sysde.eads.net)
Subject: [tao-users] Re: [tao-bugs] Re: TAO 1.2.7: core dump in 
__static_initialization_and_destruction_0 
Newsgroups: comp.soft-sys.ace
Date: 2002-12-18 09:03:23 PST 
 
Krishnakumar B wrote:
> 
> RedHat 8.0 with gcc-3.2. So the problem is with your environment.
> 

Thanks for this info. I have been able to verify it.

So evidently RedHat 7.1 (even with the newest gcc-2.96/binutils etc.
updates from RedHat) has some sort of basic problem with that usage
of linking.

From: Balachandran Natarajan (bala@cse.wustl.edu)
Subject: Re: [tao-bugs] Re: TAO 1.2.7: core dump in 
__static_initialization_and_destruction_0 
Newsgroups: comp.soft-sys.ace
Date: 2003-01-01 14:58:36 PST 
 
Oliver-

> I've done some grepping in the TAO sources and to my delight
> I can assert that the single culprit of these problems is the
> static TAO_Resource_Factory_Changer TAO_changer in
> $TAO_ROOT/tao/Strategies/advanced_resource.h.
> 
> IMHO the fix should then be to change the static object into
> a pointer and delegate the allocation of that object to an
> explicit initialization method.
> 
> What do you think?

IMHO, that would be a chicken and egg problem. You may need another
static object or static method to allocate that object. If you
look at the TAO_Resource_Factory_Changer's constructor implementation,
it makes a few process directive calls on the service configurator. 
This object is in place to do just that. Unless I am missing something 
subtle, I am not sure how this would help. 


From: Oliver Kellogg (Oliver.Kellogg@t-online.de)
Subject: Re: [tao-bugs] Re: TAO 1.2.7: core dump in 
__static_initialization_and_destruction_0 
Newsgroups: comp.soft-sys.ace
Date: 2003-01-01 21:38:50 PST 
 

I am proposing that the TAO_changer object be removed, and that
the TAO_Resource_Factory_Changer class have a new method,
the_instance() or some such, that must be used instead of the
TAO_changer. The pattern is described in

  http://www.parashift.com/c++-faq-lite/ctors.html#faq-10.11


From: Oliver Kellogg (Oliver.Kellogg@t-online.de)
Subject: Re: [tao-bugs] Re: TAO 1.2.7: core dump in 
__static_initialization_and_destruction_0 
Newsgroups: comp.soft-sys.ace
Date: 2003-01-02 00:58:50 PST 
 

Hi Bala,
>
>       When  will the call to the_instance () be
> made? Before we get a chance to call the_instance (), which would
> basically add stuff to the service_config, the entries in the
> service_config repository could be accessed.

The question is: What is the exact point in time that the
TAO_Changer changes the service_config repo?
We could be accessing the repo either before or after it has been
altered by the TAO_Changer, but due to static initialization issues,
we have no way of knowing.

If the accesses shall be made on the altered repo then all such
accesses must themselves call the_instance().
Comment 1 Oliver M. Kellogg 2003-01-03 06:46:16 CST
From: Balachandran Natarajan <bala@cse.wustl.edu>
Message-ID: <15892.28251.182043.118823@macarena.cs.wustl.edu>
Date: Thu, 2 Jan 2003 10:52:43 -0600
To: Oliver.Kellogg@t-online.de (Oliver Kellogg)
Cc: tao-bugs@cs.wustl.edu, bala@cse.wustl.edu
Subject: Re: [tao-bugs] Re: TAO 1.2.7: core dump in 
__static_initialization_and_destruction_0

[...]

On Thursday, 2 January, 2003 at 09:56:48 +0100, Oliver Kellogg wrote:
[snipped]
 > The question is: What is the exact point in time that the
 > TAO_Changer changes the service_config repo?

When you initialize the ORB. The ORB, during the part of
initialization configures the resource factory that needs to be used. 

 > We could be accessing the repo either before or after it has been
 > altered by the TAO_Changer, but due to static initialization issues,
 > we have no way of knowing.

Since we have a static object, the implicit assumption is that the
objects are constructed before it reaches main (). But there is a
subtle problem with the ordering though, which hasnt bit us, yet
(touch wood :-))

 > If the accesses shall be made on the altered repo then all such
 > accesses must themselves call the_instance().

The ORB doesnt really know whether the repo needs to be altered or
not. If the user wants to configure the ORB with the stuff in the
strategies library, then he would have linked in the library and
included the advanced_resources.h file. This would pull in the symbols
to construct the object and add stuff to the service_config repository. 

Am I missing something here?

Comment 2 Nanbor Wang 2003-01-03 10:35:01 CST
Accept for tao-support 
Comment 3 Nanbor Wang 2003-01-11 08:34:40 CST
Accepted for tao-support
Comment 4 Nanbor Wang 2003-01-16 05:02:42 CST
Not sure how this got closed. This is *still* a problem.
Comment 5 Oliver M. Kellogg 2003-01-29 03:09:33 CST
From: Carlos O'Ryan (spam_magnet@dev.null)
 Subject: Re: [tao-bugs] Re: TAO 1.2.7: core dump in
__static_initialization_and_destruction_0 
 Newsgroups: comp.soft-sys.ace
 Date: 2003-01-02 09:50:07 PST 


Hi,

Oliver.Kellogg@t-online.de (Oliver Kellogg) writes:

> Instead, we need an explicit call for the TAO_Changer to do its
> thing.

        Without more context your statement triggers an automatic
response of mine:  do *NOT* make the life of people that have decent
compiler/linker combos harder because somebody has a bad toolset/OS.

        If there is a reason to disable the automatic initialization
in advance_resource.h the procedure should be like this:

1) For platforms where the automatic initialization work, leave it as
   it is.
2) Disable the automatic initialization for platforms where it does
   not work.
3) Provide an explicit initializer for platforms that need it.
4) Make the explicit initializer a no-op for platforms that do not
   need it.


--- end of quote from posting by Carlos ---

I would argue that the problem is a conceptual one.
How do you expect a linker to find out about the dependencies?
It would have to analyze the executable statements in the
code to infer the dependencies - a totally non-trivial
and non-intuitive task IMHO. (For example, what about
possible recursions.)

So, contrary to what the Summary title and some of the
previous discussion may suggest, I am NOT reporting this
as a problem on a specific tool environment, but as a
general problem.

Oliver

Comment 6 Oliver M. Kellogg 2003-02-17 09:07:22 CST
I just found further evidence - within the very TAO source tree itself.

Near the beginning of the main program in
 $TAO_ROOT/orbsvcs/examples/RtEC/MCast/MCast.cpp,
there is a comment:

  // Register the default factory in the Service Configurator.
  // If your platform supports static constructors then you can
  // simply using the ACE_STATIC_SVC_DEFINE() macro, unfortunately TAO
  // must run on platforms where static constructors do not work well,
  // so we have to explicitly invoke this function.
  TAO_EC_Default_Factory::init_svcs ();

I would venture that users that haven't run into the problem are
at least sensitized when they read this comment.

So again, my vote is to not rely on implicit ordering of static
initialization at all - in particular because the problem is
aggravated when using ldopen().

Oliver