Bug 2980

Summary: dynamically loading library that depends on ACE from thread crashes
Product: ACE Reporter: Lothar Werzinger <lothar>
Component: ACE CoreAssignee: Iliyan Jeliazkov <iliyan>
Status: RESOLVED FIXED    
Severity: major CC: iliyan, jwillemsen, patrick.bennett, seibel_r, sma
Priority: P3    
Version: 5.5.8   
Hardware: x86   
OS: Linux   
Bug Depends on: 2963    
Bug Blocks: 2995, 3108, 3129    
Attachments: test case (tar.gz)
Proposed patch
Modified test case (manual)
updated test case

Description Lothar Werzinger 2007-07-05 13:54:17 CDT
ACE VERSION: 5.5.8

    HOST MACHINE and OPERATING SYSTEM:
Linux zaphod 2.6.15-28-amd64-generic #1 SMP PREEMPT Thu May 10 09:46:40 UTC 2007 x86_64 GNU/Linux

    TARGET MACHINE and OPERATING SYSTEM, if different from HOST:
    COMPILER NAME AND VERSION (AND PATCHLEVEL):
gcc (GCC) 4.1.2

    THE $ACE_ROOT/ace/config.h FILE [if you use a link to a platform-
    specific file, simply state which one]:
#define ACE_HAS_XML_SVC_CONF
// workaround for bug in gcc 4.0.x
#define ACE_LACKS_PRAGMA_ONCE
#include "ace/config-linux.h"

    THE $ACE_ROOT/include/makeinclude/platform_macros.GNU FILE [if you
    use a link to a platform-specific file, simply state which one
    (unless this isn't used in this case, e.g., with Microsoft Visual
    C++)]: 
# configure ACE/TAO for our use

templates=automatic
debug=1
optimize=1
exceptions=1
threads=1
inline=1
rtti=1
versioned_so=1
ssl=1
#no_hidden_visibility=1


# Disable the RCSID for release/non-debug builds.
CPPFLAGS += -DACE_USE_RCSID=0

CC = /opt2/linux/ix86/x86_64-pc-linux-gnu/bin/gcc
CXX = /opt2/linux/ix86/x86_64-pc-linux-gnu/bin/g++
CFLAGS += -m64 -I/opt2/linux/x86_64/include
CCFLAGS += -m64 -I/opt2/linux/x86_64/include
LDFLAGS += -m64 -I/opt2/linux/x86_64/include -L/opt2/linux/x86_64/lib

TAO_IDL_PREPROCESSOR = /opt2/linux/ix86/x86_64-pc-linux-gnu/bin/gcc

include $(ACE_ROOT)/include/makeinclude/platform_linux.GNU


    CONTENTS OF $ACE_ROOT/bin/MakeProjectCreator/config/default.features
    (used by MPC when you generate your own makefiles):

    AREA/CLASS/EXAMPLE AFFECTED:
dynamic loading

    DOES THE PROBLEM AFFECT:
        COMPILATION?
        LINKING?
            On Unix systems, did you run make realclean first?
        EXECUTION?
        OTHER (please specify)?
runtime

    SYNOPSIS:
loading a library that utilizes ACE from a thread does not work

    DESCRIPTION:

The testcase (will be attached) when compiled without -DUSE_THREAD works as expected

main - entered
loadDll - entered
loadDll - leaving
loadDll finished
(27351|46912507654896) capi_dosomething - entered
(27351|46912507654896) capi_dosomething - leaving
main - leaving

The testcase (will be attached) when compiled with -DUSE_THREAD fails

main - entered
loadDll - entered
loadDll - leaving
*** glibc detected *** free(): invalid pointer: 0x00000000005033c8 ***

    REPEAT BY:
run again

    SAMPLE FIX/WORKAROUND:
unknown
Comment 1 Lothar Werzinger 2007-07-05 13:56:45 CDT
Created attachment 810 [details]
test case (tar.gz)

This is the MPC based testcase.
Comment 2 Johnny Willemsen 2007-07-06 07:04:15 CDT
build@legolas:~/ACE/gcc/ACE_wrappers/tests/2980> valgrind ./bug2980
==9265== Memcheck, a memory error detector.
==9265== Copyright (C) 2002-2005, and GNU GPL'd, by Julian Seward et al.
==9265== Using LibVEX rev 1575, a library for dynamic binary translation.
==9265== Copyright (C) 2004-2005, and GNU GPL'd, by OpenWorks LLP.
==9265== Using valgrind-3.1.1, a dynamic binary instrumentation framework.
==9265== Copyright (C) 2000-2005, and GNU GPL'd, by Julian Seward et al.
==9265== For more details, rerun with: -v
==9265==
main - entered
loadDll - entered
loadDll - leaving
==9265== Thread 2:
==9265== Invalid free() / delete / delete[]
==9265==    at 0x4020BE5: operator delete(void*) (in /usr/lib/valgrind/x86-linux/vgpreload_memcheck.so)
==9265==    by 0x4C25347: ACE_Service_Config::~ACE_Service_Config() (Service_Config.cpp:556)
==9265==    by 0x4C25664: ACE_TSS<ACE_Service_Gestalt>::cleanup(void*) (TSS_T.cpp:91)
==9265==    by 0x404302E: __nptl_deallocate_tsd (in /lib/libpthread-2.4.so)
==9265==    by 0x4043358: start_thread (in /lib/libpthread-2.4.so)
==9265==    by 0x422965D: clone (in /lib/libc-2.4.so)
==9265==  Address 0x4A8F4D4 is 4 bytes inside a block of size 88 alloc'd
==9265==    at 0x40211B1: operator new(unsigned, std::nothrow_t const&) (in /usr/lib/valgrind/x86-linux/vgpreload_memcheck.so)
==9265==    by 0x4C25FB6: ACE_Unmanaged_Singleton<ACE_Service_Config, ACE_Recursive_Thread_Mutex>::instance() (Singleton.cpp:174)
==9265==    by 0x4C24826: ACE_Service_Config::global() (Service_Config.cpp:324)
==9265==    by 0x4C24D96: ACE_Service_Config::instance() (Service_Config.inl:63)
==9265==    by 0x4C24DF6: ACE_Service_Config::static_svcs() (Service_Config.cpp:316)
==9265==    by 0x4C86DAC: ACE_Object_Manager::init() (Object_Manager.cpp:134)
==9265==    by 0x4C86F44: ACE_Object_Manager::ACE_Object_Manager() (Object_Manager.cpp:312)
==9265==    by 0x4C8700A: ACE_Object_Manager::instance() (Object_Manager.cpp:332)
==9265==    by 0x4C87090: ACE_Object_Manager_Manager::ACE_Object_Manager_Manager() (Object_Manager.cpp:757)
==9265==    by 0x4C870D5: __static_initialization_and_destruction_0(int, int) (Object_Manager.cpp:773)
==9265==    by 0x4CDAAB4: (within /home/build/ACE/gcc/ACE_wrappers/ace/libACE.so.5.5.9)
==9265==    by 0x4C0638C: (within /home/build/ACE/gcc/ACE_wrappers/ace/libACE.so.5.5.9)
loadDll thread finished
(9265|69782176) capi_dosomething - entered
(9265|69782176) capi_dosomething - leaving
main - leaving
==9265==
==9265== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 15 from 1)
==9265== malloc/free: in use at exit: 15,458 bytes in 39 blocks.
==9265== malloc/free: 53 allocs, 15 frees, 41,141 bytes allocated.
==9265== For counts of detected errors, rerun with: -v
==9265== searching for pointers to 39 not-freed blocks.
==9265== checked 8,578,320 bytes.
==9265==
==9265== LEAK SUMMARY:
==9265==    definitely lost: 0 bytes in 0 blocks.
==9265==      possibly lost: 144 bytes in 1 blocks.
==9265==    still reachable: 15,314 bytes in 38 blocks.
==9265==         suppressed: 0 bytes in 0 blocks.
==9265== Reachable blocks (those to which a pointer was found) are not shown.
==9265== To see them, rerun with: --show-reachable=yes
Comment 3 Johnny Willemsen 2007-07-06 07:35:09 CDT
mine
Comment 4 Johnny Willemsen 2007-07-10 10:14:22 CDT
sames related to 2963
Comment 5 Johnny Willemsen 2007-07-28 01:46:37 CDT
Added depends
Comment 6 Iliyan Jeliazkov 2007-07-29 19:51:59 CDT
Reversing dependencies
Comment 7 Iliyan Jeliazkov 2007-07-29 19:54:31 CDT
Created attachment 836 [details]
Proposed patch

Attached is a patch that resolves the race between TSS cleanup and Object Manager's destruction of the global Service Configuration. Tested on Windows and Linux.
Comment 8 Iliyan Jeliazkov 2007-07-29 20:06:41 CDT
Created attachment 837 [details]
Modified test case (manual)

Modified the original test case to reproduce the issue on Windows, too. The problem was due to an ACE_TSS::cleanup () being called upon thread 2's exit *before* ~ACE_Service_Config executes.

Because we need a pointer to the current SC in each thread, we can't avoid using TSS. However, because ACE_TSS is designed to take over the ownership of the instance I have been careful to reset the tss_ member to 0 upon destroying SC.  Unfortunately it is insufficient as your test case demonstrates.  

The proposed change effectively makes the ACE_TSS<ACE_Service_Config>::cleanup() an non-op by providing the correspondent partial specialization. 

--Iliyan Jeliazkov
Comment 9 Iliyan Jeliazkov 2007-07-29 20:11:41 CDT
*** Bug 2963 has been marked as a duplicate of this bug. ***
Comment 10 Iliyan Jeliazkov 2007-07-29 20:12:52 CDT
Here's valgrind's output _after_ the patch:


$ valgrind ./bug2980
==26999== Memcheck, a memory error detector.
==26999== Copyright (C) 2002-2005, and GNU GPL'd, by Julian Seward et al.
==26999== Using LibVEX rev 1471, a library for dynamic binary translation.
==26999== Copyright (C) 2004-2005, and GNU GPL'd, by OpenWorks LLP.
==26999== Using valgrind-3.1.0, a dynamic binary instrumentation framework.
==26999== Copyright (C) 2000-2005, and GNU GPL'd, by Julian Seward et al.
==26999== For more details, rerun with: -v
==26999==
--26999-- WARNING: unhandled syscall: 311
--26999-- You may be able to write your own handler.
--26999-- Read the file README_MISSING_SYSCALL_OR_IOCTL.
main - entered
loadDll - entered
loadDll - leaving
loadDll thread finished
(26999|67263072) capi_dosomething - entered
(26999|67263072) capi_dosomething - leaving
main - leaving
==26999==
==26999== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 28 from 1)
==26999== malloc/free: in use at exit: 15,585 bytes in 43 blocks.
==26999== malloc/free: 53 allocs, 10 frees, 41,232 bytes allocated.
==26999== For counts of detected errors, rerun with: -v
==26999== searching for pointers to 43 not-freed blocks.
==26999== checked 10,646,948 bytes.
==26999==
==26999== LEAK SUMMARY:
==26999==    definitely lost: 0 bytes in 0 blocks.
==26999==      possibly lost: 144 bytes in 1 blocks.
==26999==    still reachable: 15,441 bytes in 42 blocks.
==26999==         suppressed: 0 bytes in 0 blocks.
==26999== Reachable blocks (those to which a pointer was found) are not shown.
==26999== To see them, rerun with: --show-reachable=yes
Comment 11 Johnny Willemsen 2007-07-30 08:12:00 CDT
Iliyan, can you commit the patch, we can then use the scoreboard to see if this works in all builds
Comment 12 Iliyan Jeliazkov 2007-07-30 20:04:07 CDT
Committed revision 79101.

Tue Jul 31 00:50:49 UTC 2007  Iliyan Jeliazkov  <iliyan@ociweb.com>

        * ace/Service_Config.cpp:

          By introducing a partial specialization of
          ACE_TSS<ACE_Service_Config> we ensure that _if_ ACE_TSS::cleanup()
          is called before ~ACE_Object_Manager(), the TSS pointer will not
          clobber the ACE_Service_Config it points to. Resolves bugzilla
          2980. Thanks to Patrick Bennett <Patrick dot Bennett at inin dot
          com> and Lothar Werzinger <lothar at tradescape dot biz> for
          their input.
Comment 13 Lothar Werzinger 2007-07-31 13:26:15 CDT
Created attachment 840 [details]
updated test case

Here's an updated test case that produces the error with the patch installed

lothar@zaphod$ ./build.sh
ACE_ROOT=/opt2/linux/x86_64/ACE/1.5.8/ACE_wrappers
TAO_ROOT=/opt2/linux/x86_64/ACE/1.5.8/ACE_wrappers/TAO
CIAO_ROOT=/opt2/linux/x86_64/ACE/1.5.8/ACE_wrappers/TAO/CIAO
DDS_ROOT=/tmp/notthere
Using .../1.5.8/ACE_wrappers/bin/MakeProjectCreator/config/MPC.cfg
Generating 'gnuace' output using default input
Generation Time: 0s
make[1]: Entering directory `/home/lothar/tmp/bug2980'
touch .depend.capi
make[1]: Leaving directory `/home/lothar/tmp/bug2980'
make[1]: Entering directory `/home/lothar/tmp/bug2980'

GNUmakefile: /home/lothar/tmp/bug2980/GNUmakefile.capi MAKEFLAGS=w -- debug=1 
optimize=0

rm -f -r \
        *.o *~ *.bak *.rpo *.sym lib*.*_pure_* \
        GNUmakefile.old core-r  \
        cxx_repository ptrepository ti_files \
        gcctemp.c gcctemp so_locations *.ics \
        templateregistry templateregistry.* ir.out core.* 
*.core  .shobj/capi.o
make[1]: Leaving directory `/home/lothar/tmp/bug2980'
make[1]: Entering directory `/home/lothar/tmp/bug2980'
touch .depend.bug2980
make[1]: Leaving directory `/home/lothar/tmp/bug2980'
make[1]: Entering directory `/home/lothar/tmp/bug2980'

GNUmakefile: /home/lothar/tmp/bug2980/GNUmakefile.bug2980 MAKEFLAGS=w -- 
debug=1 optimize=0

rm -f -r \
        *.o *~ *.bak *.rpo *.sym lib*.*_pure_* \
        GNUmakefile.old core-r  \
        cxx_repository ptrepository ti_files \
        gcctemp.c gcctemp so_locations *.ics \
        templateregistry templateregistry.* ir.out core.* 
*.core .obj/bug2980.o .obj/bug2980.o .obj/bug2980.o
make[1]: Leaving directory `/home/lothar/tmp/bug2980'
make[1]: Entering directory `/home/lothar/tmp/bug2980'

GNUmakefile: /home/lothar/tmp/bug2980/GNUmakefile.capi MAKEFLAGS=w -- debug=1 
optimize=0

/opt2/linux/ix86/x86_64-pc-linux-gnu/bin/g++ -m64 -I/opt2/linux/x86_64/include -fvisibility=hidden -fvisibility-inlines-hidden -m64 -I/opt2/linux/x86_64/include -W -Wall -Wpointer-arith -g -pipe   -pipe -DACE_USE_RCSID=0 -D_REENTRANT -DACE_HAS_AIO_CALLS -D_GNU_SOURCE   -I/opt2/linux/x86_64/ACE/1.5.8/ACE_wrappers -DACE_HAS_EXCEPTIONS -D__ACE_INLINE__ -I/opt2/linux/x86_64/ACE/1.5.8/ACE_wrappers  -c -fPIC -o .shobj/capi.o 
capi.cpp
/opt2/linux/ix86/x86_64-pc-linux-gnu/bin/g++ -DACE_USE_RCSID=0 -D_REENTRANT -DACE_HAS_AIO_CALLS -D_GNU_SOURCE   -I/opt2/linux/x86_64/ACE/1.5.8/ACE_wrappers -DACE_HAS_EXCEPTIONS -D__ACE_INLINE__ -I/opt2/linux/x86_64/ACE/1.5.8/ACE_wrappers -shared -Wl,-h -Wl,libcapi.so.5.5.8 -o 
libcapi.so.5.5.8 .shobj/capi.o -m64 -I/opt2/linux/x86_64/include -L/opt2/linux/x86_64/lib -Wl,-E -L/opt2/linux/x86_64/ACE/1.5.8/ACE_wrappers/ace -L./ -L/home/lothar/tmp/bug2980 -L. -L/opt2/linux/x86_64/ACE/1.5.8/ACE_wrappers/lib -lACE -ldl -lpthread -lrt
rm -f libcapi.so
ln -s libcapi.so.5.5.8 libcapi.so
chmod a+rx libcapi.so.5.5.8
make[1]: Leaving directory `/home/lothar/tmp/bug2980'
make[1]: Entering directory `/home/lothar/tmp/bug2980'

GNUmakefile: /home/lothar/tmp/bug2980/GNUmakefile.bug2980 MAKEFLAGS=w -- 
debug=1 optimize=0

/opt2/linux/ix86/x86_64-pc-linux-gnu/bin/g++ -m64 -I/opt2/linux/x86_64/include -fvisibility=hidden -fvisibility-inlines-hidden -m64 -I/opt2/linux/x86_64/include -W -Wall -Wpointer-arith -g -pipe   -pipe -DACE_USE_RCSID=0 -D_REENTRANT -DACE_HAS_AIO_CALLS -D_GNU_SOURCE   -I/opt2/linux/x86_64/ACE/1.5.8/ACE_wrappers -DACE_HAS_EXCEPTIONS -D__ACE_INLINE__ -DUSE_THREAD  -c -o .obj/bug2980.o 
bug2980.cpp
bug2980.cpp: In function ‘int main(int, char**)’:
bug2980.cpp:77: warning: unused variable ‘result’
bug2980.cpp: At global scope:
bug2980.cpp:75: warning: unused parameter ‘argc’
bug2980.cpp:75: warning: unused parameter ‘argv’
/opt2/linux/ix86/x86_64-pc-linux-gnu/bin/g++ -m64 -I/opt2/linux/x86_64/include -fvisibility=hidden -fvisibility-inlines-hidden -m64 -I/opt2/linux/x86_64/include -W -Wall -Wpointer-arith -g -pipe   -pipe -DACE_USE_RCSID=0 -D_REENTRANT -DACE_HAS_AIO_CALLS -D_GNU_SOURCE   -I/opt2/linux/x86_64/ACE/1.5.8/ACE_wrappers -DACE_HAS_EXCEPTIONS -D__ACE_INLINE__ -DUSE_THREAD  -m64 -I/opt2/linux/x86_64/include -L/opt2/linux/x86_64/lib -Wl,-E -L/opt2/linux/x86_64/ACE/1.5.8/ACE_wrappers/ace -L./ -L/opt2/linux/x86_64/ACE/1.5.8/ACE_wrappers/lib -L. -o 
bug2980 .obj/bug2980.o  -ldl -lpthread -lrt
make[1]: Leaving directory `/home/lothar/tmp/bug2980'
LD_LIBRARY_PATH=.:/opt2/linux/x86_64/ACE/1.5.8/ACE_wrappers/lib
main - entered
loadDll - entered
ACE::init()
loadDll - leaving
(11885|1082132832) capi_dosomething - entered
(11885|1082132832) capi_dosomething - leaving
unloadDll - entered
ACE::fini()
unloadDll - leaving
./build.sh: line 82: 11885 Segmentation fault      (core dumped) ./${bugname}
<~/tmp/bug2980>
lothar@zaphod$ gdb --core core* ./bug2980
GNU gdb 6.4-debian
Copyright 2005 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu"...Using host libthread_db 
library "/lib/libthread_db.so.1".

Core was generated by `./bug2980'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /lib/libdl.so.2...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/libpthread.so.0...done.
Loaded symbols for /lib/libpthread.so.0
Reading symbols from /lib/librt.so.1...done.
Loaded symbols for /lib/librt.so.1
Reading symbols from /usr/lib/libstdc++.so.6...done.
Loaded symbols for /usr/lib/libstdc++.so.6
Reading symbols from /lib/libm.so.6...done.
Loaded symbols for /lib/libm.so.6
Reading symbols from /lib/libgcc_s.so.1...done.
Loaded symbols for /lib/libgcc_s.so.1
Reading symbols from /lib/libc.so.6...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/ld-linux-x86-64.so.2...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
#0  0x00002aaaab781f80 in ?? ()
(gdb) where
#0  0x00002aaaab781f80 in ?? ()
#1  0x00002aaaaacc9fa8 in __nptl_deallocate_tsd () from /lib/libpthread.so.0
#2  0x00002aaaaacca108 in start_thread () from /lib/libpthread.so.0
#3  0x00002aaaab438ce2 in clone () from /lib/libc.so.6
#4  0x0000000000000000 in ?? ()
(gdb) q

lothar@zaphod$ valgrind ./bug2980
==12097== Memcheck, a memory error detector.
==12097== Copyright (C) 2002-2005, and GNU GPL'd, by Julian Seward et al.
==12097== Using LibVEX rev 1471, a library for dynamic binary translation.
==12097== Copyright (C) 2004-2005, and GNU GPL'd, by OpenWorks LLP.
==12097== Using valgrind-3.1.0-Debian, a dynamic binary instrumentation 
framework.
==12097== Copyright (C) 2000-2005, and GNU GPL'd, by Julian Seward et al.
==12097== For more details, rerun with: -v
==12097==
main - entered
loadDll - entered
==12097== Thread 2:
==12097== Invalid read of size 8
==12097==    at 0x401064A: (within /lib/ld-2.3.6.so)
==12097==    by 0x40089BC: (within /lib/ld-2.3.6.so)
==12097==    by 0x4004DF3: (within /lib/ld-2.3.6.so)
==12097==    by 0x4006612: (within /lib/ld-2.3.6.so)
==12097==    by 0x53C54FB: (within /lib/libc-2.3.6.so)
==12097==    by 0x400B13F: (within /lib/ld-2.3.6.so)
==12097==    by 0x53C60A9: _dl_open (in /lib/libc-2.3.6.so)
==12097==    by 0x4B1F043: (within /lib/libdl-2.3.6.so)
==12097==    by 0x400B13F: (within /lib/ld-2.3.6.so)
==12097==    by 0x4B1F541: (within /lib/libdl-2.3.6.so)
==12097==    by 0x4B1F081: dlopen (in /lib/libdl-2.3.6.so)
==12097==    by 0x400EE1: loadDll() (bug2980.cpp:18)
==12097==  Address 0x5D06138 is 8 bytes inside a block of size 13 alloc'd
==12097==    at 0x4A19A16: malloc (vg_replace_malloc.c:149)
==12097==    by 0x40061D2: (within /lib/ld-2.3.6.so)
==12097==    by 0x40068D7: (within /lib/ld-2.3.6.so)
==12097==    by 0x53C54FB: (within /lib/libc-2.3.6.so)
==12097==    by 0x400B13F: (within /lib/ld-2.3.6.so)
==12097==    by 0x53C60A9: _dl_open (in /lib/libc-2.3.6.so)
==12097==    by 0x4B1F043: (within /lib/libdl-2.3.6.so)
==12097==    by 0x400B13F: (within /lib/ld-2.3.6.so)
==12097==    by 0x4B1F541: (within /lib/libdl-2.3.6.so)
==12097==    by 0x4B1F081: dlopen (in /lib/libdl-2.3.6.so)
==12097==    by 0x400EE1: loadDll() (bug2980.cpp:18)
==12097==    by 0x4010EE: loadunloadDll(void*) (bug2980.cpp:64)
==12097==
==12097== Conditional jump or move depends on uninitialised value(s)
==12097==    at 0x4010531: (within /lib/ld-2.3.6.so)
==12097==    by 0x4008A9E: (within /lib/ld-2.3.6.so)
==12097==    by 0x4004DF3: (within /lib/ld-2.3.6.so)
==12097==    by 0x4006612: (within /lib/ld-2.3.6.so)
==12097==    by 0x53C54FB: (within /lib/libc-2.3.6.so)
==12097==    by 0x400B13F: (within /lib/ld-2.3.6.so)
==12097==    by 0x53C60A9: _dl_open (in /lib/libc-2.3.6.so)
==12097==    by 0x4B1F043: (within /lib/libdl-2.3.6.so)
==12097==    by 0x400B13F: (within /lib/ld-2.3.6.so)
==12097==    by 0x4B1F541: (within /lib/libdl-2.3.6.so)
==12097==    by 0x4B1F081: dlopen (in /lib/libdl-2.3.6.so)
==12097==    by 0x400EE1: loadDll() (bug2980.cpp:18)
==12097==
==12097== Invalid read of size 8
==12097==    at 0x401067E: (within /lib/ld-2.3.6.so)
==12097==    by 0x40089BC: (within /lib/ld-2.3.6.so)
==12097==    by 0x4004DF3: (within /lib/ld-2.3.6.so)
==12097==    by 0x4006612: (within /lib/ld-2.3.6.so)
==12097==    by 0x4009C2C: (within /lib/ld-2.3.6.so)
==12097==    by 0x400B13F: (within /lib/ld-2.3.6.so)
==12097==    by 0x4009F32: (within /lib/ld-2.3.6.so)
==12097==    by 0x53C555A: (within /lib/libc-2.3.6.so)
==12097==    by 0x400B13F: (within /lib/ld-2.3.6.so)
==12097==    by 0x53C60A9: _dl_open (in /lib/libc-2.3.6.so)
==12097==    by 0x4B1F043: (within /lib/libdl-2.3.6.so)
==12097==    by 0x400B13F: (within /lib/ld-2.3.6.so)
==12097==  Address 0x5D06708 is 56 bytes inside a block of size 62 alloc'd
==12097==    at 0x4A19A16: malloc (vg_replace_malloc.c:149)
==12097==    by 0x40061D2: (within /lib/ld-2.3.6.so)
==12097==    by 0x40068D7: (within /lib/ld-2.3.6.so)
==12097==    by 0x4009C2C: (within /lib/ld-2.3.6.so)
==12097==    by 0x400B13F: (within /lib/ld-2.3.6.so)
==12097==    by 0x4009F32: (within /lib/ld-2.3.6.so)
==12097==    by 0x53C555A: (within /lib/libc-2.3.6.so)
==12097==    by 0x400B13F: (within /lib/ld-2.3.6.so)
==12097==    by 0x53C60A9: _dl_open (in /lib/libc-2.3.6.so)
==12097==    by 0x4B1F043: (within /lib/libdl-2.3.6.so)
==12097==    by 0x400B13F: (within /lib/ld-2.3.6.so)
==12097==    by 0x4B1F541: (within /lib/libdl-2.3.6.so)
==12097==
==12097== Conditional jump or move depends on uninitialised value(s)
==12097==    at 0x4008F11: (within /lib/ld-2.3.6.so)
==12097==    by 0x53C5646: (within /lib/libc-2.3.6.so)
==12097==    by 0x400B13F: (within /lib/ld-2.3.6.so)
==12097==    by 0x53C60A9: _dl_open (in /lib/libc-2.3.6.so)
==12097==    by 0x4B1F043: (within /lib/libdl-2.3.6.so)
==12097==    by 0x400B13F: (within /lib/ld-2.3.6.so)
==12097==    by 0x4B1F541: (within /lib/libdl-2.3.6.so)
==12097==    by 0x4B1F081: dlopen (in /lib/libdl-2.3.6.so)
==12097==    by 0x400EE1: loadDll() (bug2980.cpp:18)
==12097==    by 0x4010EE: loadunloadDll(void*) (bug2980.cpp:64)
==12097==    by 0x4C260F9: start_thread (in /lib/libpthread-2.3.6.so)
==12097==    by 0x5393CE1: clone (in /lib/libc-2.3.6.so)
==12097==
==12097== Conditional jump or move depends on uninitialised value(s)
==12097==    at 0x4008F51: (within /lib/ld-2.3.6.so)
==12097==    by 0x53C5646: (within /lib/libc-2.3.6.so)
==12097==    by 0x400B13F: (within /lib/ld-2.3.6.so)
==12097==    by 0x53C60A9: _dl_open (in /lib/libc-2.3.6.so)
==12097==    by 0x4B1F043: (within /lib/libdl-2.3.6.so)
==12097==    by 0x400B13F: (within /lib/ld-2.3.6.so)
==12097==    by 0x4B1F541: (within /lib/libdl-2.3.6.so)
==12097==    by 0x4B1F081: dlopen (in /lib/libdl-2.3.6.so)
==12097==    by 0x400EE1: loadDll() (bug2980.cpp:18)
==12097==    by 0x4010EE: loadunloadDll(void*) (bug2980.cpp:64)
==12097==    by 0x4C260F9: start_thread (in /lib/libpthread-2.3.6.so)
==12097==    by 0x5393CE1: clone (in /lib/libc-2.3.6.so)
ACE::init()
loadDll - leaving
(12097|97540448) capi_dosomething - entered
(12097|97540448) capi_dosomething - leaving
unloadDll - entered
ACE::fini()
unloadDll - leaving
==12097==
==12097== Jump to the invalid address stated on the next line
==12097==    at 0x5FDBF80: ???
==12097==  Address 0x5FDBF80 is not stack'd, malloc'd or (recently) free'd
==12097==
==12097== Process terminating with default action of signal 11 (SIGSEGV)
==12097==  Access not within mapped region at address 0x5FDBF80
==12097==    at 0x5FDBF80: ???
==12097==
==12097== ERROR SUMMARY: 8 errors from 6 contexts (suppressed: 8 from 1)
==12097== malloc/free: in use at exit: 8,449 bytes in 3 blocks.
==12097== malloc/free: 55 allocs, 52 frees, 40,472 bytes allocated.
==12097== For counts of detected errors, rerun with: -v
==12097== searching for pointers to 3 not-freed blocks.
==12097== checked 8,688,952 bytes.
==12097==
==12097== LEAK SUMMARY:
==12097==    definitely lost: 0 bytes in 0 blocks.
==12097==      possibly lost: 136 bytes in 1 blocks.
==12097==    still reachable: 8,313 bytes in 2 blocks.
==12097==         suppressed: 0 bytes in 0 blocks.
==12097== Reachable blocks (those to which a pointer was found) are not shown.
==12097== To see them, rerun with: --show-reachable=yes
Killed
Comment 14 Johnny Willemsen 2007-07-31 14:31:42 CDT
I can reproduce the crash of the latest test case. Searching on the web I do see some references that this could happen when pthread_getspecific is called after the TSD is destroyed and that the value is then tried to be used
Comment 15 Johnny Willemsen 2007-07-31 14:44:33 CDT
[build@shelob bug2980]$ valgrind  --leak-check=full -v ./bug2980
==10400== Memcheck, a memory error detector.
==10400== Copyright (C) 2002-2006, and GNU GPL'd, by Julian Seward et al.
==10400== Using LibVEX rev 1658, a library for dynamic binary translation.
==10400== Copyright (C) 2004-2006, and GNU GPL'd, by OpenWorks LLP.
==10400== Using valgrind-3.2.1, a dynamic binary instrumentation framework.
==10400== Copyright (C) 2000-2006, and GNU GPL'd, by Julian Seward et al.
==10400==
--10400-- Command line
--10400--    ./bug2980
--10400-- Startup, with flags:
--10400--    --leak-check=full
--10400--    -v
--10400-- Contents of /proc/version:
--10400--   Linux version 2.6.18-8.1.8.el5 (brewbuilder@hs20-bc2-3.build.redhat.com) (gcc version 4.1.1 20070105 (Red Hat 4.1.1-52)) #1 SMP Mon Jun 25 17:06:07 EDT 2007
--10400-- Arch and hwcaps: AMD64, amd64-sse2
--10400-- Valgrind library directory: /usr/lib64/valgrind
--10400-- Reading syms from /home/build/ACE/RT1975/ACE_wrappers/tests/bug2980/bug2980/bug2980 (0x400000)
--10400-- Reading syms from /usr/lib64/valgrind/amd64-linux/memcheck (0x38000000)
--10400--    object doesn't have a dynamic symbol table
--10400-- Reading syms from /lib64/ld-2.5.so (0x36E5C00000)
--10400-- Reading suppressions file: /usr/lib64/valgrind/default.supp
--10400-- Reading syms from /usr/lib64/valgrind/amd64-linux/vgpreload_core.so (0x4802000)
--10400-- Reading syms from /usr/lib64/valgrind/amd64-linux/vgpreload_memcheck.so (0x4A03000)
--10400-- REDIR: 0x36E5C13D80 (index) redirected to 0x4A06550 (index)
--10400-- REDIR: 0x36E5C13F30 (strcmp) redirected to 0x4A067D0 (strcmp)
--10400-- REDIR: 0x36E5C13F60 (strlen) redirected to 0x4A06700 (strlen)
--10400-- Reading syms from /lib64/libdl-2.5.so (0x36E6800000)
--10400-- Reading syms from /lib64/libpthread-2.5.so (0x36E6C00000)
--10400-- Reading syms from /lib64/librt-2.5.so (0x36E7400000)
--10400-- Reading syms from /usr/lib64/libstdc++.so.6.0.8 (0x36F7A00000)
--10400--    object doesn't have a symbol table
--10400-- Reading syms from /lib64/libm-2.5.so (0x36E6400000)
--10400-- Reading syms from /lib64/libgcc_s-4.1.1-20070105.so.1 (0x36F5600000)
--10400--    object doesn't have a symbol table
--10400-- Reading syms from /lib64/libc-2.5.so (0x36E6000000)
--10400-- REDIR: 0x36E60765C0 (memset) redirected to 0x4A06920 (memset)
--10400-- REDIR: 0x36E6076CE0 (memcpy) redirected to 0x4A06FF0 (memcpy)
--10400-- REDIR: 0x36E6075710 (rindex) redirected to 0x4A06400 (rindex)
--10400-- REDIR: 0x36E6075320 (strlen) redirected to 0x4A066C0 (strlen)
main - entered
--10400-- REDIR: 0x36E60702F0 (calloc) redirected to 0x4A04AAC (calloc)
--10400-- REDIR: 0x36E6072110 (realloc) redirected to 0x4A05838 (realloc)
loadDll - entered
--10400-- REDIR: 0x36E6070660 (malloc) redirected to 0x4A05787 (malloc)
--10400-- Reading syms from /home/build/ACE/RT1975/ACE_wrappers/tests/bug2980/bug2980/libcapi.so.5.5.10 (0x5727000)
--10400-- Reading syms from /home/build/ACE/icc10_64/ACE_wrappers/ace/libACE.so.5.5.10 (0x5928000)
--10400-- Reading syms from /opt/intel/cce/10.0.025/lib/libimf.so (0x5DF1000)
--10400-- Reading syms from /opt/intel/cce/10.0.025/lib/libsvml.so (0x6152000)
--10400-- Reading syms from /opt/intel/cce/10.0.025/lib/libintlc.so.5 (0x62D3000)
--10400-- Reading syms from /opt/intel/cce/10.0.025/lib/libcxaguard.so.5 (0x640C000)
--10400-- REDIR: 0x36F7ABD110 (operator new(unsigned long)) redirected to 0x4A05F95 (operator new(unsigned long))
--10400-- REDIR: 0x36E6075E10 (memchr) redirected to 0x4A06850 (memchr)
--10400-- REDIR: 0x36E6071F30 (free) redirected to 0x4A05397 (free)
--10400-- REDIR: 0x36F7ABD240 (operator new[](unsigned long)) redirected to 0x4A05C35 (operator new[](unsigned long))
--10400-- REDIR: 0x36E60755A0 (strncmp) redirected to 0x4A06720 (strncmp)
ACE::init()
loadDll - leaving
--10400-- REDIR: 0x36E6075500 (strncat) redirected to 0x4A06580 (strncat)
--10400-- REDIR: 0xFFFFFFFFFF600000 (???) redirected to 0x380279D7 (???)
--10400-- REDIR: 0x36E6077540 (rawmemchr) redirected to 0x4A069E0 (rawmemchr)
(10400|90335552) capi_dosomething - entered
--10400-- REDIR: 0x36F7ABBF30 (operator delete[](void*)) redirected to 0x4A04D25 (operator delete[](void*))
(10400|90335552) capi_dosomething - leaving
unloadDll - entered
--10400-- REDIR: 0x36F7ABBEF0 (operator delete(void*)) redirected to 0x4A050A9 (operator delete(void*))
ACE::fini()
--10400-- Discarding syms at 0x5727000-0x5928000 in /home/build/ACE/RT1975/ACE_wrappers/tests/bug2980/bug2980/libcapi.so.5.5.10 due to munmap()
--10400-- Discarding syms at 0x5928000-0x5DF1000 in /home/build/ACE/icc10_64/ACE_wrappers/ace/libACE.so.5.5.10 due to munmap()
--10400-- Discarding syms at 0x5DF1000-0x6152000 in /opt/intel/cce/10.0.025/lib/libimf.so due to munmap()
--10400-- Discarding syms at 0x6152000-0x62D3000 in /opt/intel/cce/10.0.025/lib/libsvml.so due to munmap()
--10400-- Discarding syms at 0x62D3000-0x640C000 in /opt/intel/cce/10.0.025/lib/libintlc.so.5 due to munmap()
--10400-- Discarding syms at 0x640C000-0x650D000 in /opt/intel/cce/10.0.025/lib/libcxaguard.so.5 due to munmap()
unloadDll - leaving
==10400== Thread 2:
==10400== Jump to the invalid address stated on the next line
==10400==    at 0x5AC789A: ???
==10400==  Address 0x5AC789A is not stack'd, malloc'd or (recently) free'd
==10400==
==10400== Process terminating with default action of signal 11 (SIGSEGV)
==10400==  Access not within mapped region at address 0x5AC789A
==10400==    at 0x5AC789A: ???
==10400==
==10400== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 22 from 1)
==10400==
==10400== 1 errors in context 1 of 1:
==10400== Jump to the invalid address stated on the next line
==10400==    at 0x5AC789A: ???
==10400==  Address 0x5AC789A is not stack'd, malloc'd or (recently) free'd
--10400--
--10400-- supp:   22 Fedora-Core-6-hack3-ld25
==10400==
==10400== IN SUMMARY: 1 errors from 1 contexts (suppressed: 22 from 1)
==10400==
==10400== malloc/free: in use at exit: 10,649 bytes in 4 blocks.
==10400== malloc/free: 72 allocs, 68 frees, 47,107 bytes allocated.
==10400==
==10400== searching for pointers to 4 not-freed blocks.
==10400== checked 10,667,392 bytes.
==10400==
==10400== Thread 1:
==10400==
==10400== 288 bytes in 1 blocks are possibly lost in loss record 1 of 4
==10400==    at 0x4A04B32: calloc (vg_replace_malloc.c:279)
==10400==    by 0x36E5C0FC82: _dl_allocate_tls (in /lib64/ld-2.5.so)
==10400==    by 0x36E6C06904: pthread_create@@GLIBC_2.2.5 (in /lib64/libpthread-2.5.so)
==10400==    by 0x400F4E: main (bug2980.cpp:83)
==10400==
==10400==
==10400== 8,313 (4,216 direct, 4,097 indirect) bytes in 1 blocks are definitely lost in loss record 4 of 4
==10400==    at 0x4A06019: operator new(unsigned long) (vg_replace_malloc.c:167)
==10400==    by 0x5AA1F90: ???
==10400==    by 0x5B050CE: ???
==10400==    by 0x5A8A61C: ???
==10400==    by 0x5AC3771: ???
==10400==    by 0x5AC3652: ???
==10400==    by 0x5AD198A: ???
==10400==    by 0x5AD19C1: ???
==10400==    by 0x5AD199A: ???
==10400==    by 0x5B09B55: ???
==10400==    by 0x5A3D3FA: ???
==10400==
==10400== LEAK SUMMARY:
==10400==    definitely lost: 4,216 bytes in 1 blocks.
==10400==    indirectly lost: 4,097 bytes in 1 blocks.
==10400==      possibly lost: 288 bytes in 1 blocks.
==10400==    still reachable: 2,048 bytes in 1 blocks.
==10400==         suppressed: 0 bytes in 0 blocks.
==10400== Reachable blocks (those to which a pointer was found) are not shown.
==10400== To see them, rerun with: --show-reachable=yes
--10400--  memcheck: sanity checks: 26 cheap, 2 expensive
--10400--  memcheck: auxmaps: 278 auxmap entries (17792k, 17M) in use
--10400--  memcheck: auxmaps: 3527715 searches, 15791142 comparisons
--10400--  memcheck: SMs: n_issued      = 33 (528k, 0M)
--10400--  memcheck: SMs: n_deissued    = 5 (80k, 0M)
--10400--  memcheck: SMs: max_noaccess  = 524287 (8388592k, 8191M)
--10400--  memcheck: SMs: max_undefined = 0 (0k, 0M)
--10400--  memcheck: SMs: max_defined   = 704 (11264k, 11M)
--10400--  memcheck: SMs: max_non_DSM   = 33 (528k, 0M)
--10400--  memcheck: max sec V bit nodes:    0 (0k, 0M)
--10400--  memcheck: set_sec_vbits8 calls: 0 (new: 0, updates: 0)
--10400--  memcheck: max shadow mem size:   4672k, 4M
--10400-- translate:            fast SP updates identified: 4,387 ( 89.6%)
--10400-- translate:   generic_known SP updates identified: 400 (  8.1%)
--10400-- translate: generic_unknown SP updates identified: 109 (  2.2%)
--10400--     tt/tc: 9,847 tt lookups requiring 10,123 probes
--10400--     tt/tc: 9,846 fast-cache updates, 9 flushes
--10400--  transtab: new        4,602 (100,448 -> 1,911,798; ratio 190:10) [0 scs]
--10400--  transtab: dumped     0 (0 -> ??)
--10400--  transtab: discarded  1,463 (25,055 -> ??)
--10400-- scheduler: 2,684,169 jumps (bb entries).
--10400-- scheduler: 26/5,666 major/minor sched events.
--10400--    sanity: 27 cheap, 2 expensive checks.
--10400--    exectx: 30,011 lists, 101 contexts (avg 0 per list)
--10400--    exectx: 163 searches, 80 full compares (490 per 1000)
--10400--    exectx: 0 cmp2, 66 cmp4, 0 cmpAll
Killed
[build@shelob bug2980]$
Comment 16 Lothar Werzinger 2007-07-31 14:58:45 CDT
commenting out the ACE::init() and ACE::fini() calls does not change the outcome when I run the updated test here.

main - entered
loadDll - entered
loadDll - leaving
(14254|1082132832) capi_dosomething - entered
(14254|1082132832) capi_dosomething - leaving
unloadDll - entered
unloadDll - leaving
./build.sh: line 82: 14254 Segmentation fault      (core dumped) ./${bugname}

lothar@zaphod$ gdb --core core* bug2980
GNU gdb 6.4-debian
Copyright 2005 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu"...Using host libthread_db library "/lib/libthread_db.so.1".

Core was generated by `./bug2980'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /lib/libdl.so.2...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/libpthread.so.0...done.
Loaded symbols for /lib/libpthread.so.0
Reading symbols from /lib/librt.so.1...done.
Loaded symbols for /lib/librt.so.1
Reading symbols from /usr/lib/libstdc++.so.6...done.
Loaded symbols for /usr/lib/libstdc++.so.6
Reading symbols from /lib/libm.so.6...done.
Loaded symbols for /lib/libm.so.6
Reading symbols from /lib/libgcc_s.so.1...done.
Loaded symbols for /lib/libgcc_s.so.1
Reading symbols from /lib/libc.so.6...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/ld-linux-x86-64.so.2...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
#0  0x00002aaaab781f80 in ?? ()
(gdb) where
#0  0x00002aaaab781f80 in ?? ()
#1  0x00002aaaaacc9fa8 in __nptl_deallocate_tsd () from /lib/libpthread.so.0
#2  0x00002aaaaacca108 in start_thread () from /lib/libpthread.so.0
#3  0x00002aaaab438ce2 in clone () from /lib/libc.so.6
#4  0x0000000000000000 in ?? ()
(gdb)    

==14797== Memcheck, a memory error detector.
==14797== Copyright (C) 2002-2005, and GNU GPL'd, by Julian Seward et al.
==14797== Using LibVEX rev 1471, a library for dynamic binary translation.
==14797== Copyright (C) 2004-2005, and GNU GPL'd, by OpenWorks LLP.
==14797== Using valgrind-3.1.0-Debian, a dynamic binary instrumentation framework.
==14797== Copyright (C) 2000-2005, and GNU GPL'd, by Julian Seward et al.
==14797== For more details, rerun with: -v
==14797==
main - entered
loadDll - entered
==14797== Thread 2:
==14797== Invalid read of size 8
==14797==    at 0x401064A: (within /lib/ld-2.3.6.so)
==14797==    by 0x40089BC: (within /lib/ld-2.3.6.so)
==14797==    by 0x4004DF3: (within /lib/ld-2.3.6.so)
==14797==    by 0x4006612: (within /lib/ld-2.3.6.so)
==14797==    by 0x53C54FB: (within /lib/libc-2.3.6.so)
==14797==    by 0x400B13F: (within /lib/ld-2.3.6.so)
==14797==    by 0x53C60A9: _dl_open (in /lib/libc-2.3.6.so)
==14797==    by 0x4B1F043: (within /lib/libdl-2.3.6.so)
==14797==    by 0x400B13F: (within /lib/ld-2.3.6.so)
==14797==    by 0x4B1F541: (within /lib/libdl-2.3.6.so)
==14797==    by 0x4B1F081: dlopen (in /lib/libdl-2.3.6.so)
==14797==    by 0x400EE1: loadDll() (bug2980.cpp:18)
==14797==  Address 0x5D06138 is 8 bytes inside a block of size 13 alloc'd
==14797==    at 0x4A19A16: malloc (vg_replace_malloc.c:149)
==14797==    by 0x40061D2: (within /lib/ld-2.3.6.so)
==14797==    by 0x40068D7: (within /lib/ld-2.3.6.so)
==14797==    by 0x53C54FB: (within /lib/libc-2.3.6.so)
==14797==    by 0x400B13F: (within /lib/ld-2.3.6.so)
==14797==    by 0x53C60A9: _dl_open (in /lib/libc-2.3.6.so)
==14797==    by 0x4B1F043: (within /lib/libdl-2.3.6.so)
==14797==    by 0x400B13F: (within /lib/ld-2.3.6.so)
==14797==    by 0x4B1F541: (within /lib/libdl-2.3.6.so)
==14797==    by 0x4B1F081: dlopen (in /lib/libdl-2.3.6.so)
==14797==    by 0x400EE1: loadDll() (bug2980.cpp:18)
==14797==    by 0x4010EE: loadunloadDll(void*) (bug2980.cpp:64)
==14797==
==14797== Conditional jump or move depends on uninitialised value(s)
==14797==    at 0x4010531: (within /lib/ld-2.3.6.so)
==14797==    by 0x4008A9E: (within /lib/ld-2.3.6.so)
==14797==    by 0x4004DF3: (within /lib/ld-2.3.6.so)
==14797==    by 0x4006612: (within /lib/ld-2.3.6.so)
==14797==    by 0x53C54FB: (within /lib/libc-2.3.6.so)
==14797==    by 0x400B13F: (within /lib/ld-2.3.6.so)
==14797==    by 0x53C60A9: _dl_open (in /lib/libc-2.3.6.so)
==14797==    by 0x4B1F043: (within /lib/libdl-2.3.6.so)
==14797==    by 0x400B13F: (within /lib/ld-2.3.6.so)
==14797==    by 0x4B1F541: (within /lib/libdl-2.3.6.so)
==14797==    by 0x4B1F081: dlopen (in /lib/libdl-2.3.6.so)
==14797==    by 0x400EE1: loadDll() (bug2980.cpp:18)
==14797==
==14797== Invalid read of size 8
==14797==    at 0x401067E: (within /lib/ld-2.3.6.so)
==14797==    by 0x40089BC: (within /lib/ld-2.3.6.so)
==14797==    by 0x4004DF3: (within /lib/ld-2.3.6.so)
==14797==    by 0x4006612: (within /lib/ld-2.3.6.so)
==14797==    by 0x4009C2C: (within /lib/ld-2.3.6.so)
==14797==    by 0x400B13F: (within /lib/ld-2.3.6.so)
==14797==    by 0x4009F32: (within /lib/ld-2.3.6.so)
==14797==    by 0x53C555A: (within /lib/libc-2.3.6.so)
==14797==    by 0x400B13F: (within /lib/ld-2.3.6.so)
==14797==    by 0x53C60A9: _dl_open (in /lib/libc-2.3.6.so)
==14797==    by 0x4B1F043: (within /lib/libdl-2.3.6.so)
==14797==    by 0x400B13F: (within /lib/ld-2.3.6.so)
==14797==  Address 0x5D06708 is 56 bytes inside a block of size 62 alloc'd
==14797==    at 0x4A19A16: malloc (vg_replace_malloc.c:149)
==14797==    by 0x40061D2: (within /lib/ld-2.3.6.so)
==14797==    by 0x40068D7: (within /lib/ld-2.3.6.so)
==14797==    by 0x4009C2C: (within /lib/ld-2.3.6.so)
==14797==    by 0x400B13F: (within /lib/ld-2.3.6.so)
==14797==    by 0x4009F32: (within /lib/ld-2.3.6.so)
==14797==    by 0x53C555A: (within /lib/libc-2.3.6.so)
==14797==    by 0x400B13F: (within /lib/ld-2.3.6.so)
==14797==    by 0x53C60A9: _dl_open (in /lib/libc-2.3.6.so)
==14797==    by 0x4B1F043: (within /lib/libdl-2.3.6.so)
==14797==    by 0x400B13F: (within /lib/ld-2.3.6.so)
==14797==    by 0x4B1F541: (within /lib/libdl-2.3.6.so)
==14797==
==14797== Conditional jump or move depends on uninitialised value(s)
==14797==    at 0x4008F11: (within /lib/ld-2.3.6.so)
==14797==    by 0x53C5646: (within /lib/libc-2.3.6.so)
==14797==    by 0x400B13F: (within /lib/ld-2.3.6.so)
==14797==    by 0x53C60A9: _dl_open (in /lib/libc-2.3.6.so)
==14797==    by 0x4B1F043: (within /lib/libdl-2.3.6.so)
==14797==    by 0x400B13F: (within /lib/ld-2.3.6.so)
==14797==    by 0x4B1F541: (within /lib/libdl-2.3.6.so)
==14797==    by 0x4B1F081: dlopen (in /lib/libdl-2.3.6.so)
==14797==    by 0x400EE1: loadDll() (bug2980.cpp:18)
==14797==    by 0x4010EE: loadunloadDll(void*) (bug2980.cpp:64)
==14797==    by 0x4C260F9: start_thread (in /lib/libpthread-2.3.6.so)
==14797==    by 0x5393CE1: clone (in /lib/libc-2.3.6.so)
==14797==
==14797== Conditional jump or move depends on uninitialised value(s)
==14797==    at 0x4008F51: (within /lib/ld-2.3.6.so)
==14797==    by 0x53C5646: (within /lib/libc-2.3.6.so)
==14797==    by 0x400B13F: (within /lib/ld-2.3.6.so)
==14797==    by 0x53C60A9: _dl_open (in /lib/libc-2.3.6.so)
==14797==    by 0x4B1F043: (within /lib/libdl-2.3.6.so)
==14797==    by 0x400B13F: (within /lib/ld-2.3.6.so)
==14797==    by 0x4B1F541: (within /lib/libdl-2.3.6.so)
==14797==    by 0x4B1F081: dlopen (in /lib/libdl-2.3.6.so)
==14797==    by 0x400EE1: loadDll() (bug2980.cpp:18)
==14797==    by 0x4010EE: loadunloadDll(void*) (bug2980.cpp:64)
==14797==    by 0x4C260F9: start_thread (in /lib/libpthread-2.3.6.so)
==14797==    by 0x5393CE1: clone (in /lib/libc-2.3.6.so)
loadDll - leaving
(14797|97540448) capi_dosomething - entered
(14797|97540448) capi_dosomething - leaving
unloadDll - entered
unloadDll - leaving
==14797==
==14797== Jump to the invalid address stated on the next line
==14797==    at 0x5FDBF80: ???
==14797==  Address 0x5FDBF80 is not stack'd, malloc'd or (recently) free'd
==14797==
==14797== Process terminating with default action of signal 11 (SIGSEGV)
==14797==  Access not within mapped region at address 0x5FDBF80
==14797==    at 0x5FDBF80: ???
==14797==
==14797== ERROR SUMMARY: 8 errors from 6 contexts (suppressed: 8 from 1)
==14797== malloc/free: in use at exit: 8,449 bytes in 3 blocks.
==14797== malloc/free: 55 allocs, 52 frees, 40,472 bytes allocated.
==14797== For counts of detected errors, rerun with: -v
==14797== searching for pointers to 3 not-freed blocks.
==14797== checked 8,688,952 bytes.
==14797==
==14797== LEAK SUMMARY:
==14797==    definitely lost: 0 bytes in 0 blocks.
==14797==      possibly lost: 136 bytes in 1 blocks.
==14797==    still reachable: 8,313 bytes in 2 blocks.
==14797==         suppressed: 0 bytes in 0 blocks.
==14797== Reachable blocks (those to which a pointer was found) are not shown.
==14797== To see them, rerun with: --show-reachable=yes

Comment 17 Iliyan Jeliazkov 2007-07-31 15:20:21 CDT
(In reply to comment #16)
> commenting out the ACE::init() and ACE::fini() calls does not change the
> outcome when I run the updated test here.


Lothar,

Thanks for running the test on your platform. The stack traces however are a bit difficult to read. Do you have to enable the debug info?

Thanks,

--Iliyan 
Comment 18 Lothar Werzinger 2007-07-31 15:35:41 CDT
I use
make -f GNUmakefile debug=1 optimize=0 clean all
to build the test case (see build.sh).
Is there anything else I can do?
Comment 19 Johnny Willemsen 2007-08-01 04:44:25 CDT
When I define ACE_HAS_BROKEN_THREAD_KEYFREE in my config.h file on linux and rebuild the crash is gone. With this define in Log_Msg.cpp additonal cleanup is done.
Comment 20 Johnny Willemsen 2007-08-01 05:02:27 CDT
We should migrate the test case to tests/Unload_libACE.cpp. That program seems to do much like the same. I am going to revert the define and then try to merge to test into this file
Comment 21 Johnny Willemsen 2007-08-01 05:19:27 CDT
Difference is that it loads the ace dll, we now have a case where we load a dll that uses ace
Comment 22 Johnny Willemsen 2007-08-01 05:57:55 CDT
at least on linux with this define the specializations of ACE_TSS seems not to be needed anymore. Lothar, can you retest with your full app with the define set I mentioned in comment 19
Comment 23 Iliyan Jeliazkov 2007-08-01 07:36:40 CDT
(In reply to comment #21)
> Difference is that it loads the ace dll, we now have a case where we load a 
> dll that uses ace

Do you mean we load a dll that uses ACE already _linked_ in the main process?



Comment 24 Johnny Willemsen 2007-08-01 09:47:44 CDT
I am not sure what you mean, but the test case of Lother is slightly different then the one in the repo. Let us first wait for test results of Lothar with the define I uncovered

Comment 25 Lothar Werzinger 2007-08-01 15:30:07 CDT
This test needs a main program that does NOT link against ACE.
If the main program links against ACE the error does not show.

If I recompile ACE/TAO with ACE_HAS_BROKEN_THREAD_KEYFREE defined the problem goes away.

Is this a general ACE_TSS issue or why do we have to use the ACE_HAS_BROKEN_THREAD_KEYFREE  workaround?
Comment 26 Patrick Bennett 2007-08-01 16:42:43 CDT
This isn't a 'broken' TSS issue.  It's a design problem.
Still, I take it the problem with ACE_Service_Config went away with the template specialization but now Log_Msg is choking up the works?
Comment 27 Rich Seibel 2007-08-06 15:53:44 CDT
I have done some additional investigation on this bug.

First, I reproduced the bug with the code supplied by
Lothar as the modified test case.:

It didn't appear to have anything to do with the 
changes Iliyan made for the ORB-local configuration.

So I confirmed that the problem is present before
any of the changes made by Iliyan, that is I confirmed
that the problem exists using version 5.5.1.

Patrick Bennett noted that it's a design problem, 
not a problem with TSS.  I believe that is the case.

Though it seems natural to be able to create a 
thread and then use that thread to both load and
unload a bit of code, this will not work if any
Thread Specific Storage is used.  I will explain.
the underlying TSS mechanism uses a callback to
allow the user to clean up any dynamically 
allocated storage attached to a TSS key. So, if
a program uses TSS from within the dynamically
loaded code, that code must still be present
when the callback is invoked, which is at the
time that the thread exits.

To confirm this analysis, I modified the example
by moving the unloadDLL to after the thread
join.  The example then worked for both with
and without Iliyan's changes.  Nor did it need
the template specialization patch Iliyan proposed.
Comment 28 Steve Totten 2007-08-06 16:01:30 CDT
(In reply to comment #27)
> I have done some additional investigation on this bug.
> 
> First, I reproduced the bug with the code supplied by
> Lothar as the modified test case.:
> 
> It didn't appear to have anything to do with the 
> changes Iliyan made for the ORB-local configuration.
> 
> So I confirmed that the problem is present before
> any of the changes made by Iliyan, that is I confirmed
> that the problem exists using version 5.5.1.

Since this issue has nothing to do with Iliyan's changes, I am reassigning it
back to the general 'pool'.
Comment 29 Lothar Werzinger 2007-08-06 16:43:35 CDT
The problem is that an external SW is using our library (that depends on ACE and TAO) in a thread. There's nothing we can do about that. What we can do is to provide functions in the library that get called after load and before unload (just like in the test provided).
Btw. they told us they do load many other libraries (including Database access libraries) that do not show any problems when loaded/unloaded from a thread.

I think it should be possible for ACE to cleanup any use of TSS if called from a special cleanup function/method like ACE::fini().

I see no reason why ACE should illbehave if loaded/unloaded from a child thread compared to the main thread.
Comment 30 Johnny Willemsen 2007-08-07 04:57:29 CDT
(In reply to comment #27)
> Though it seems natural to be able to create a 
> thread and then use that thread to both load and
> unload a bit of code, this will not work if any
> Thread Specific Storage is used.  I will explain.
> the underlying TSS mechanism uses a callback to
> allow the user to clean up any dynamically 
> allocated storage attached to a TSS key. So, if
> a program uses TSS from within the dynamically
> loaded code, that code must still be present
> when the callback is invoked, which is at the
> time that the thread exits.

Couldn't when the ACE library gets a call to ACE::fini() it could deallocate that data and then disable the callback. Then we get an explicit cleanup, I have enabled a define for the Log_Msg TSS data, that seems to do the trick there. 
Comment 31 Rich Seibel 2007-08-07 13:44:53 CDT
> ------- Comment #29 From Lothar Werzinger 2007-08-06 16:43:35 [reply] -------
> The problem is that an external SW is using our library (that depends on ACE
> and TAO) in a thread. There's nothing we can do about that. What we can do is
> to provide functions in the library that get called after load and before
> unload (just like in the test provided).
> Btw. they told us they do load many other libraries (including Database access
> libraries) that do not show any problems when loaded/unloaded from a thread.

I suspect these libraries either do not use TSS at all (most likely) or 
they do not use the cleanup callback.
> 
> I think it should be possible for ACE to cleanup any use of TSS if called from
> a special cleanup function/method like ACE::fini().

Possible yes, almost anything is possible given enough time and money.  In
this case the ACE_TSS mechanism and how it is used would have to be 
redesigned.  The use of the cleanup callback would have to be eliminated.
Then either the memory would be leaked or another mechanism created to
clean it up.  Either case would likely be intrusive on the user.
> 
> I see no reason why ACE should illbehave if loaded/unloaded from a child thread
> compared to the main thread.

See below for a suggestion.
> 
> ------- Comment #30 From Johnny Willemsen 2007-08-07 04:57:29 [reply] -------
> Couldn't when the ACE library gets a call to ACE::fini() it could deallocate
> that data 

That would be a good time to do the deallocation, however that is not
possible.  The deallocation must be done by the same thread that does the
allocation since no other thread has access to the space, that's what makes
it thread specific.

> and then disable the callback. 

There is no way to disable a callback, the only choice is to not supply
a callback in the keycreate.

> Then we get an explicit cleanup, I
> have enabled a define for the Log_Msg TSS data, that seems to do the trick
> there. 

I see that, but cannot explain how it is working.  The code that the
callback is invoking should not be present, nor the objects that it
references.

The only possibility for a solution that comes to mind is a key 
manager class with the following properties.
   1. A key manager object would have to be created global to all 
      the threads that would use it.
   2. Each thread would have to acquire the key from the manager.
      This would be reference counted in the manager.
   3. Each thread could then use getspecific/setspecific as usual.
   4. Each thread would have to release the key to the manager.
   5. The key would have to be destroyed global to all the threads
      that would use it.  The thread doing the destroy would have to
      wait for all the threads still using the key to finish, or
      not wait and leak the key and any space allocated to it
      by threads.

Comment 32 Lothar Werzinger 2007-08-07 14:59:44 CDT
I just read the man page for pthread_key_create and it states:

An optional destructor function may be associated with each key value. At thread exit, if a key value has a non-NULL destructor pointer, and the thread has a non-NULL value associated with that key, the value of the key is set to NULL, and then the function pointed to is called with the previously associated value as its sole argument. The order of destructor calls is unspecified if more than one destructor exists for a thread when it exits.


Too me this looks like ACE could simply destruct the TSS object manually at the ACE::fini() call and then set the key value to 0. Thus it is guaranteed by pthread that the callback is NOT called.

As ACE::fini() in the test case is called from the same thread that used ACE::init() and no other threads have been created that might still run this should work.

It is clear that no other threads can run at that time, but that would fail anyway if you try to unload the library as the threads code would go away ;-)
Comment 33 Lothar Werzinger 2007-08-07 15:12:36 CDT
In response to http://deuce.doc.wustl.edu/bugzilla/﷒0﷓:

I don't think a key manager is required.
For all threads except for the "master" thread the TSS mechanism works as
expected, as the threads MUST terminate before the library can be unloaded.

The ONLY thread affected by this problem is the "master" thread and as I wrote
in http://deuce.doc.wustl.edu/bugzilla/﷒1﷓ setting the TSS
value to zero does the trick.

As ACE knows all the TSS objects it uses itself in the library it should be
possible to call the destructors manually and set the values to zero from
ACE::fini()
Comment 34 Rich Seibel 2007-08-07 15:57:33 CDT
(In reply to comment #32)
> I just read the man page for pthread_key_create and it states:
> 
> An optional destructor function may be associated with each key value. At
> thread exit, if a key value has a non-NULL destructor pointer, and the thread
> has a non-NULL value associated with that key, the value of the key is set to
> NULL, and then the function pointed to is called with the previously associated
> value as its sole argument. The order of destructor calls is unspecified if
> more than one destructor exists for a thread when it exits.

That explains why ACE_HAS_BROKEN_THREAD_KEYFREE works, since when this
option is specified, code is added to specifically free the storage and set the
key to NULL.
> 
> 
> Too me this looks like ACE could simply destruct the TSS object manually at the
> ACE::fini() call and then set the key value to 0. Thus it is guaranteed by
> pthread that the callback is NOT called.

I still believe that the storage needs to be deleted and key value set to zero
in each
thread that uses it.  In your example, there was only one thread.
> 
> As ACE::fini() in the test case is called from the same thread that used
> ACE::init() and no other threads have been created that might still run this
> should work.

Yes, I believe it would.
> 
> It is clear that no other threads can run at that time, but that would fail
> anyway if you try to unload the library as the threads code would go away ;-)
> 
Comment 35 Lothar Werzinger 2007-08-07 17:25:35 CDT
> I still believe that the storage needs to be deleted and key value set to
> zero in each thread that uses it. 
> In your example, there was only one thread.

As already explained there can not be ANY other threads. The problem occurs only if the shared library get's unloaded. This CAN ONLY be done when NO threads execute code in that library any longer. As only ONE thread remains, ACE::fini() SHOULD be able to clean up and set the key value to zero.
Comment 36 Rich Seibel 2007-08-08 09:42:55 CDT
(In reply to comment #35)
> As already explained there can not be ANY other threads. The problem occurs
> only if the shared library get's unloaded. This CAN ONLY be done when NO
> threads execute code in that library any longer. As only ONE thread remains,
> ACE::fini() SHOULD be able to clean up and set the key value to zero.
> 
Give it a go.  I don't see any missing pieces.

Comment 37 Johnny Willemsen 2007-10-31 06:27:35 CDT
the fix seems to cause the memory leak in bugzilla 3108
Comment 38 Johnny Willemsen 2007-11-01 05:57:50 CDT
I am going to remove the following code from Service_Config.h, it causes the memory leak as described in bugzilla 3108 which affects all our users. This bug is still open and we do need a real patch for this problem

/// This specialization enures ACE_TSS will _not_ perform a delete on
/// (ACE_Service_Gestalt*) p upon thread exit, when TSS is cleaned
/// up. Note that the tss_ member will be destroyed with the
/// ACE_Object_Manager's ACE_Service_Config singleton, so no leaks
/// will be introduced.
/// We need this non-ownership ACE_TSS because the SC instance is
/// really owned by the Object Manager and only it must do the cleanup.
///
/// Naturally, things would be simpler, if we could
/// avoid using the TSS altogether but we need the ability to
/// temporarily designate a different SC instance as the "default."
/// So, the solution is a hybrid, or non-owner ACE_TSS.  See bugzila
/// 2980 for a description of a test case where ACE_TSS::cleanup() is
/// called before ~ACE_Object_Manager.

# if defined (ACE_MT_SAFE) && (ACE_MT_SAFE != 0)
// Since ACE_TSS<>::cleanup() is only defined in
// multithreaded builds ...
template<> inline void
ACE_TSS<ACE_Service_Gestalt>::cleanup (void*ptr)
{
  // Borland C++ 2007 *needs* the parameter
  // name, but it is not clear why ...
  ACE_UNUSED_ARG (ptr);
}
# else
template<> inline
ACE_TSS<ACE_Service_Gestalt>::~ACE_TSS (void)
{
  // Without threads, the ACE_TSS cleanup is done by ~ACE_TSS()
}
# endif /* ACE_MT_SAFE */


Comment 39 Johnny Willemsen 2007-11-01 09:54:09 CDT
Maybe the change below also fixes this, on linux we do have this define set

Thu Nov  1 14:40:00 UTC 2007  Simon Massey  <simon.massey@prismtech.com>

        * ace/OS_NS_Thread.cpp:

          Systems with ACE_HAS_BROKEN_THREAD_KEYFREE requires some
          cleanup within ACE_OS::thr_keyfree_native() otherwise they
          can crash at thread_exit if ACE is dynamically loaded.
Comment 40 Simon McQueen 2007-11-01 09:55:24 CDT
Wrong Simon.
Comment 41 Patrick Bennett 2007-11-01 10:04:48 CDT
Taking that out will break the fix for bug 2963 which is far more severe than a simple leak.!
Comment 42 Johnny Willemsen 2007-11-01 10:11:00 CDT
(In reply to comment #41)
> Taking that out will break the fix for bug 2963 which is far more severe than a
> simple leak.!

Bug 2963 and 2980 are both about problems which happen when an application that doesn't use ACE loads the ACE library. Bug 2980 is still considered a high priority problem (because of that 2963 is also high important). The issue is that 3108 will cause any user a problem, the number of people affected by 2980/2963 is just much less. The SG/SC code is still problematic and this has to be resolved asap.
Comment 43 Simon Massey 2007-11-01 10:39:01 CDT
I'm sorry but I was not aware of this bug when I looked at the failure of the
lynxOS builds dynamic unloading of libACE.

I tracked the problem down to the dynamically loaded libACE.so
which is unloaded prior to the tests/UnloadLibAce exit. At this exit time the
clean-up code for the already keyfreed entry for the ACE_TSS <ACE_Service_Gestalt> tss_;  is re-executed because of the "broken" nature of
the thread clean up on these systems. Since this code has already be run once at the DLL unload time, and the entry for the TSS has been explicitly freed (if the OS was working correctly) it should not be run again.

On those systems where keyfree is not actually deactivating the key it is necessary to put in a NULL pointer to stop the action from being re-applied.

I wasn't aware that you had just removed the "NULL ACTION" teplate specialisation. (Which I would  have throught was still necessary since TSS shouldn't be cleaning up the ACE_Service_Gestalt.)

All my mod will do is stop the "double" cleanup from occuring which fixes the DDL unload problem with or without your removal of the "Do Nothing" cleanup specialiation.

Basically it sounds like we should have the option of registing a null clean-up routine for the TSS so that the ACE_Service_Gestalt service pointer can be "owned" by something else. This should be a new TSS constructor parameter that stops the cleanup routine from being registered. Then you wouldn't need the "Do Nothing" specialization.
Comment 44 Patrick Bennett 2007-11-01 11:45:03 CDT
The specialization can absolutely not be removed.  It will reintroduce a critical bug.  The pointer should never be deleted - ACE_TSS is designed to 'own' its pointers, yet it's being given a pointer to an *aggregated* member!  It's a horribly broken design.  Read my comments in bug 2963!!!
Comment 45 Johnny Willemsen 2007-11-01 15:23:30 CDT
(In reply to comment #44)
> The specialization can absolutely not be removed.  It will reintroduce a
> critical bug.  The pointer should never be deleted - ACE_TSS is designed to
> 'own' its pointers, yet it's being given a pointer to an *aggregated* member! 
> It's a horribly broken design.  Read my comments in bug 2963!!!

The patch removal has been reverted, it breaks the msvc support. I agree that we have a design problem in this area. At this moment we have a huge memory leak
Comment 46 Johnny Willemsen 2007-11-02 05:30:40 CDT
Revised reversal has been committed. The valgrind build shows with the partly patch (which did work on linux) a drop in memory leaks from 3462 to 2808 for all the tests combined. 

The basic design problem pointed out by Patrick should be resolved.
Comment 47 Patrick Bennett 2007-11-28 12:58:53 CST
So was code put back in that causes bug 2963 to be back?  That is a huge issue and causes outright crashes.  This never should have been changed back if so.  A crash is far worse than a memory leak. 
Comment 48 Johnny Willemsen 2007-11-28 13:10:49 CST
Yes, the code has been put back. We need a different solution that doesn't cause a leak.
Comment 49 Patrick Bennett 2007-11-28 14:38:43 CST
You need a solution that doesn't crash.  I'm floored that it was decided that crashing was 'ok' and a critical fix was removed.
Comment 50 Simon Massey 2007-11-29 03:08:52 CST
This should NOT have put back in the "bugzilla 2963 crash" why do you think it has?

The pointer should NOT be deleted with the current revised patch,
Johnny modified the storing of this pointer so that the clean up routine was NOT entered against this pointer, therefore the template code was NOT being called. Therefore it was safe to remove this template specialisation code as it
was totally redundant.
Comment 51 Patrick Bennett 2007-11-29 08:44:00 CST
Well, just looking at the latest diffs, it seems this will reintroduce 2963.  The code was basically put back the way it was originally.  Nothing is preventing the condition I described from happening anymore.  With the specialization removed, cleanup_tss will still try to delete an incorrect pointer on the heap as I described in 2963.
Comment 52 Simon Massey 2007-11-29 09:26:58 CST
Having looked a bit more closly at ace/tss_t.cpp, I'm inclined to agree with you, I thourght that "this->tss_.ts_object (this);" didn't register the clean-up function unline the constructor, but this looks incorrect to me now.

My original thinking was:
-------------
There is more to this than the removal of the template specializtion in service_config.h, namly modification of the two constructors in service_config.cpp (see lines 387 and 457).

Removing the
      , tss_ (this)
from both the constructors initalization list (therefore constructing them NULL), and the addition of the call
      this->tss_.ts_object (this);
in the body of these constructors, forcing in the pointer to the tss but not setting the clean up function for this pointer (it remains NULL).

Therefore there is NO clean up function registered and NO clean up will be performed for these pointers at exit.  Since NO clean up is registered it makes no sence to declare the template specialization that performs a no action clean up, because it will now never be called.
----------------

However it looks like we need another function such as:

template <class TYPE> TYPE *
ACE_TSS<TYPE>::ts_object_nodelete (TYPE *new_ts_obj);

which then inpacts on
ACE_TSS_Adapter ((void *) new_ts_obj,                                 ACE_TSS<TYPE>::cleanup)

as this doesn't allow for a null ACE_TSS<TYPE>::cleanup.
Comment 53 Simon Massey 2007-11-29 09:49:02 CST
Can't we just create a wrapper class around the ACE_Service_Gestalt that internally holds a simple *ACE_Service_Gestalt  and that simply doesn't delete this upon it's own destruction (sort of like the inverse of an autopointer) and register it with the tss instead of the ACE_Service_Gestalt pointer.

This would also not need the template specializations.....

it would have to have members such as
class nodelete_gestalt
{
public:
operator ACE_Service_Gestalt *();
ACE_Service_Gestalt *operator -> ();
nodelete_gestalt(ACE_Service_Gestalt *pointer):pointer_(pointer)  {};
nodelete_gestalt~() {};
private:
ACE_Service_Gestalt *pointer_;
};
Comment 54 Iliyan Jeliazkov 2007-11-29 10:15:08 CST
(In reply to comment #53)
> Can't we just create a wrapper class around the ACE_Service_Gestalt that
> internally holds a simple *ACE_Service_Gestalt  and that simply doesn't delete
> this upon it's own destruction (sort of like the inverse of an autopointer) and
> register it with the tss instead of the ACE_Service_Gestalt pointer.


Hi Simon, 

You've been reading my mind ... :)
 
The patch I've been working on lately does exactly that, based on an intrusive_ptr-like smart pointer.  Unfortunately, the SC stuff is pervasive and has many corner cases while the time I can spend on it is very limited - so my progress has been slow. Nevertheless, I will soon attach it here for public review and comments.

--Iliyan
Comment 55 Lothar Werzinger 2008-02-21 19:35:35 CST
I just checked with the lastest release X.6.3 and it still crashes.
I think this is a SERIOUS bug and it needs to be fixed.

I don't care about memory leaks as much as I care about crashes. If the only way to solve this bug is to have a leak, so be it.

Maybe we can provide a preprocessor define for those who prefer to crash insted of leak :-)

Comment 56 Iliyan Jeliazkov 2008-02-22 09:21:19 CST
Hi Lothar,

(In reply to comment #55)
> I just checked with the lastest release X.6.3 and it still crashes.
> I think this is a SERIOUS bug and it needs to be fixed.

Indeed, I agree and appreciate your help. I have been working on a parallel branch to refine the SC mechanism and address this problem on a more fundamental level. Look at svn://svn.dre.vanderbilt.edu/DOC/Middleware/branches/iliyan-gestalt 

> I don't care about memory leaks as much as I care about crashes. If the only
> way to solve this bug is to have a leak, so be it.

I think we can have our cake and eat it. Could you please do me a favor and try the branch I refered to above? Note that it includes your test as part of the ACE test suite - Bug_2980_Regression

> Maybe we can provide a preprocessor define for those who prefer to crash insted
> of leak :-)

:) it's an option, but let's consider it a last resort ...
Comment 57 Lothar Werzinger 2008-02-22 15:53:37 CST
(In reply to comment #56)

I created a patch from svn://svn.dre.vanderbilt.edu/DOC/Middleware/branches/iliyan-gestalt/ACE/ace
to my x.6.3 tree and built our application with that patched x.6.3 tree.

I do not experience any problems with the patch, so it's "works for me".

Thanks for the joint effort to put this nasty one to rest!
Comment 58 Iliyan Jeliazkov 2008-02-22 16:24:02 CST
(In reply to comment #57)
> (In reply to comment #56)
> 
> I created a patch from
> svn://svn.dre.vanderbilt.edu/DOC/Middleware/branches/iliyan-gestalt/ACE/ace
> to my x.6.3 tree and built our application with that patched x.6.3 tree.
> 
> I do not experience any problems with the patch, so it's "works for me".
> 
> Thanks for the joint effort to put this nasty one to rest!

Yippie kay ye!!! :))

Thanks, Lothar!

Just to clarify - I assume, you tested with your 64-bit build? (the one I had so much trouble reproducing)

Also, did you try the Bug_2980_Regression_Test, or ran your own?
Comment 59 Lothar Werzinger 2008-02-22 21:12:42 CST
(In reply to comment #58)

I tested 64bit and 32bit versions of our application.
I did not try the Bug_2980_Regression_Test (I only checked out your ace subdirectory to create the patch).

But I successfully tried the test case attached to this bug. Here's the log:

lothar@regensburg$ tar -xvf bug2980.tar.gz
bug2980/
bug2980/build.sh
bug2980/bug2980.mpc
bug2980/capi.mpc
bug2980/capi.cpp
bug2980/bug2980.cpp
</tmp>
lothar@regensburg$ cd bug2980/
</tmp/bug2980>
lothar@regensburg$ vim build.sh
</tmp/bug2980>
lothar@regensburg$ ./build.sh
ACE_ROOT=/opt2/linux/x86_64/ACE/1.6.3/ACE_wrappers
TAO_ROOT=/opt2/linux/x86_64/ACE/1.6.3/ACE_wrappers/TAO
CIAO_ROOT=/opt2/linux/x86_64/ACE/1.6.3/ACE_wrappers/TAO/CIAO
DDS_ROOT=/tmp/notthere
Using .../1.6.3/ACE_wrappers/bin/MakeProjectCreator/config/MPC.cfg
Generating 'gnuace' output using default input
Generation Time: 0s
make[1]: Entering directory `/tmp/bug2980'
touch .depend.capi
make[1]: Leaving directory `/tmp/bug2980'
make[1]: Entering directory `/tmp/bug2980'

GNUmakefile: /tmp/bug2980/GNUmakefile.capi MAKEFLAGS=w -- debug=1 optimize=0

rm -f -r \
        *.o *~ *.bak *.rpo *.sym lib*.*_pure_* \
        GNUmakefile.old core-r  \
        cxx_repository ptrepository ti_files \
        gcctemp.c gcctemp so_locations *.ics \
        templateregistry templateregistry.* ir.out core.* *.core  .shobj/capi.o
make[1]: Leaving directory `/tmp/bug2980'
make[1]: Entering directory `/tmp/bug2980'
touch .depend.bug2980
make[1]: Leaving directory `/tmp/bug2980'
make[1]: Entering directory `/tmp/bug2980'

GNUmakefile: /tmp/bug2980/GNUmakefile.bug2980 MAKEFLAGS=w -- debug=1 optimize=0

rm -f -r \
        *.o *~ *.bak *.rpo *.sym lib*.*_pure_* \
        GNUmakefile.old core-r  \
        cxx_repository ptrepository ti_files \
        gcctemp.c gcctemp so_locations *.ics \
        templateregistry templateregistry.* ir.out core.* *.core .obj/bug2980.o .obj/bug2980.o .obj/bug2980.o
make[1]: Leaving directory `/tmp/bug2980'
make[1]: Entering directory `/tmp/bug2980'

GNUmakefile: /tmp/bug2980/GNUmakefile.capi MAKEFLAGS=w -- debug=1 optimize=0

/opt2/linux/ix86/bin/g++-4.2.3 -m64 -I/opt2/linux/x86_64/include -W -Wall -Wpointer-arith  -g -pipe -DACE_USE_RCSID=0 -D_REENTRANT -DACE_HAS_AIO_CALLS -D_GNU_SOURCE -DACE_HAS_CUSTOM_EXPORT_MACROS=0   -I/opt2/linux/x86_64/ACE/1.6.3/ACE_wrappers -DACE_HAS_EXCEPTIONS -D__ACE_INLINE__ -I/opt2/linux/x86_64/ACE/1.6.3/ACE_wrappers -DACE_HAS_IPV6  -c -fPIC -o .shobj/capi.o capi.cpp
/opt2/linux/ix86/bin/g++-4.2.3 -DACE_USE_RCSID=0 -D_REENTRANT -DACE_HAS_AIO_CALLS -D_GNU_SOURCE -DACE_HAS_CUSTOM_EXPORT_MACROS=0   -I/opt2/linux/x86_64/ACE/1.6.3/ACE_wrappers -DACE_HAS_EXCEPTIONS -D__ACE_INLINE__ -I/opt2/linux/x86_64/ACE/1.6.3/ACE_wrappers -DACE_HAS_IPV6 -shared -Wl,-h -Wl,libcapi.so.5.6.3 -o libcapi.so.5.6.3 .shobj/capi.o -m64 -I/opt2/linux/x86_64/include -L/opt2/linux/x86_64/lib -Wl,-E -L/opt2/linux/x86_64/ACE/1.6.3/ACE_wrappers/ace -L./ -L/tmp/bug2980 -L. -L/opt2/linux/x86_64/ACE/1.6.3/ACE_wrappers/lib -lACE -ldl -lpthread -lrt
rm -f libcapi.so
/opt2/linux/x86_64/ACE/1.6.3/ACE_wrappers/bin/add_rel_link.sh libcapi.so.5.6.3 libcapi.so
ln -s libcapi.so.5.6.3 libcapi.so
chmod a+rx libcapi.so.5.6.3
make[1]: Leaving directory `/tmp/bug2980'
make[1]: Entering directory `/tmp/bug2980'

GNUmakefile: /tmp/bug2980/GNUmakefile.bug2980 MAKEFLAGS=w -- debug=1 optimize=0

/opt2/linux/ix86/bin/g++-4.2.3 -m64 -I/opt2/linux/x86_64/include -W -Wall -Wpointer-arith  -g -pipe -DACE_USE_RCSID=0 -D_REENTRANT -DACE_HAS_AIO_CALLS -D_GNU_SOURCE -DACE_HAS_CUSTOM_EXPORT_MACROS=0   -I/opt2/linux/x86_64/ACE/1.6.3/ACE_wrappers -DACE_HAS_EXCEPTIONS -D__ACE_INLINE__ -DUSE_THREAD  -c -o .obj/bug2980.o bug2980.cpp
bug2980.cpp: In function ‘int main(int, char**)’:
bug2980.cpp:77: warning: unused variable ‘result’
bug2980.cpp: At global scope:
bug2980.cpp:75: warning: unused parameter ‘argc’
bug2980.cpp:75: warning: unused parameter ‘argv’
/opt2/linux/ix86/bin/g++-4.2.3 -m64 -I/opt2/linux/x86_64/include -W -Wall -Wpointer-arith  -g -pipe -DACE_USE_RCSID=0 -D_REENTRANT -DACE_HAS_AIO_CALLS -D_GNU_SOURCE -DACE_HAS_CUSTOM_EXPORT_MACROS=0   -I/opt2/linux/x86_64/ACE/1.6.3/ACE_wrappers -DACE_HAS_EXCEPTIONS -D__ACE_INLINE__ -DUSE_THREAD  -m64 -I/opt2/linux/x86_64/include -L/opt2/linux/x86_64/lib -Wl,-E -L/opt2/linux/x86_64/ACE/1.6.3/ACE_wrappers/ace -L./ -L/opt2/linux/x86_64/ACE/1.6.3/ACE_wrappers/lib -L. -o bug2980 .obj/bug2980.o  -ldl -lpthread -lrt
make[1]: Leaving directory `/tmp/bug2980'
LD_LIBRARY_PATH=.:/opt2/linux/x86_64/ACE/1.6.3/ACE_wrappers/lib
main - entered
loadDll - entered
ACE::init()
loadDll - leaving
(26353|1082132816) capi_dosomething - entered
(26353|1082132816) capi_dosomething - leaving
unloadDll - entered
ACE::fini()
unloadDll - leaving
loadunloadDll thread finished
main - leaving
</tmp/bug2980>
Comment 60 Johnny Willemsen 2008-03-28 05:03:02 CDT
to Iliyan
Comment 61 Johnny Willemsen 2008-04-14 05:12:45 CDT
fixed as part of the upcoming x.6.4
Comment 62 Johnny Willemsen 2008-04-14 05:13:40 CDT
fixed
Comment 63 Steve Huston 2008-08-04 12:02:11 CDT
This problem is still occurring on AIX and HP-UX. Also, Mac OS X and NetBSD, but I don't have access to those.

It does appear to be that pthreads is cleaning up TSS after ACE has been unloaded and crashing on a bad pointer.

Any hints on where to zero in on for this?
Comment 64 Johnny Willemsen 2008-08-04 12:24:11 CDT
Steve, can you try to set ACE_HAS_BROKEN_THREAD_KEYFREE for hpux/aix?
Comment 65 Steve Huston 2008-08-04 14:34:40 CDT
I could, but I'm reluctant to disable a working OS feature. Prior notes in this thread indicate that it would resolve this particular use case, but that's not really a fix.

Based on earlier diagnosis, what TSS is likely to be causing the issue here?
Comment 66 Lothar Werzinger 2008-08-04 15:43:43 CDT
(In reply to comment #65)
As I stated in comment #32

I believe that as long as the SAME thread that loads the library also unloads the library it should work IF ACE were to manually unset the TSS value.

From the pthread_key_delete manpage:
> The pthread_key_delete() function shall be callable from within destructor
> functions. No destructor functions shall be invoked by pthread_key_delete().
> Any destructor function that may have been associated with key shall no
> longer be called upon thread exit.

I believe the TSS on Linux, AIX, MacOS, ... works as intended and that the usage of ACE_HAS_BROKEN_THREAD_KEYFREE is a BUG.

Just manually call pthread_key_delete upon unloading the library.
Comment 67 Steve Huston 2008-08-04 15:56:04 CDT
(In reply to comment #66)
> (In reply to comment #65)
> As I stated in comment #32
> 
> I believe that as long as the SAME thread that loads the library also unloads
> the library it should work IF ACE were to manually unset the TSS value.

Right - it should also work if the last thread with ACE TSS values set unloads the library (after calling ACE::fini()).

> From the pthread_key_delete manpage:
> > The pthread_key_delete() function shall be callable from within destructor
> > functions. No destructor functions shall be invoked by pthread_key_delete().
> > Any destructor function that may have been associated with key shall no
> > longer be called upon thread exit.

Right - after spending some time with Butenhof's "Programming with POSIX Threads" this is what I understand as well.

> I believe the TSS on Linux, AIX, MacOS, ... works as intended and that the
> usage of ACE_HAS_BROKEN_THREAD_KEYFREE is a BUG.

I agree.

> Just manually call pthread_key_delete upon unloading the library.

I don't think this will be correct (but I could be wrong) because we may not know when the library is about to be unloaded... but if we set the TSS value to 0 (which we do know is legit) that should work as well.

Essentially, the code in the ACE_HAS_BROKEN_THREAD_KEYFREE is probably what should be done anyway - it's not broken.
What do you think?
Comment 68 Lothar Werzinger 2008-08-04 16:01:26 CDT
Here is some more "wisdom" from the pthread_key_delete manpage on my Linux system:

A thread-specific data key deletion function has been included in order to allow the resources associated with an unused thread-specific data key to be freed. Unused thread-specific data keys can arise, among other scenarios, when a dynamically loaded module that allocated a key is unloaded. 

 Conforming applications are responsible for performing any cleanup actions needed for data structures associated with the key to be deleted, including data referenced by thread-specific data values. No such cleanup is done by pthread_key_delete(). In particular, destructor functions are not called. There are several reasons for this division of responsibility:
Comment 69 Steve Huston 2008-08-05 14:25:37 CDT
Fixed:
Tue Aug  5 16:41:03 UTC 2008  Steve Huston  <shuston@riverace.com>

Essentially, removes the ifdef ACE_HAS_BROKEN_THREAD_KEYFREE - all platforms act this way - it's correct. Also removed ACE_HAS_BROKEN_THREAD_KEYFREE from config-linux-common.h since it was only added for this case and is not needed.

Thus, the only platform still with ACE_HAS_BROKEN_THREAD_KEYFREE set is LynxOS. From comments as to why it's there, it should also be removed, but I don't have a way to test this.