Bug 2818

Summary: Notification service core dumps when using -ORBDaemon param
Product: TAO Reporter: Sam <janka.ace>
Component: Notification ServiceAssignee: Iliyan Jeliazkov <iliyan>
Status: REOPENED ---    
Severity: major CC: janka.ace
Priority: P1    
Version: 1.5.10   
Hardware: SPARC   
OS: Solaris   
Bug Depends on:    
Bug Blocks: 3043    

Description Sam 2007-02-19 03:59:47 CST
Notification service core dumps with SIGBUS signal when using it with -ORBDaemon
parameter, to daemonize it. Previous version (1.5.2) work correctly.
Comment 1 Johnny Willemsen 2007-02-19 04:08:19 CST
Steve, can someon of OCI have a look at this?
Comment 2 Steve Totten 2007-02-19 09:54:47 CST
Accepted on behalf of OCI.
Comment 3 Steve Totten 2007-02-19 09:59:06 CST
janka.ace@mailnull.com (Sam) wrote:
> Notification service core dumps with SIGBUS signal when using it with
> -ORBDaemon parameter, to daemonize it. Previous version (1.5.2) work
> correctly.

Can you provide more information about the failure?  For example:

- Stack trace from the core file
- Complete list of command line parameters provided to the Notification
  Service server
- Any Service Configurator (svc.conf) options provided to the Notification
  Service server

Thanks!
Steve Totten
Comment 4 Sam 2007-02-19 10:18:36 CST
COMMAND:
 LD_LIBRARY_PATH=/opt/TAO-1.5.6/lib /opt/TAO-1.5.6/bin/Notify_Service -IORoutput
/tmp/nos.ior -DispatchingThreads 10 -ORBRunThreads 10 -AllocateTaskperProxy
-AsynchUpdates -ORBEndpoint iiop://1.0@belladona-3 -ORBInitRef
NameService=file:///tmp/ns.ior -ORBDebugLevel 10 -ORBLogFile=/tmp/logx -ORBDaemon

It doesn't creates core dump, i see it make sigbus only when running via truss
(solaris strace). Here is its output:

3/1:         lwp_park(0x00000000, 0)                         = 0
1643/1:         lwp_sigmask(SIG_SETMASK, 0xFFBFFEFF, 0x0000FFF7) = 0xFFBFFEFF
[0x0000FFFF]
1643/1:         lwp_sigmask(SIG_SETMASK, 0x00000000, 0x00000000) = 0xFFBFFEFF
[0x0000FFFF]
1643/1:         lwp_sigmask(SIG_SETMASK, 0xFFBFFEFF, 0x0000FFF7) = 0xFFBFFEFF
[0x0000FFFF]
1643/1:         lwp_sigmask(SIG_SETMASK, 0x00000000, 0x00000000) = 0xFFBFFEFF
[0x0000FFFF]
1643/1:         fstat64(3, 0xFFBFEBA0)                          = 0
1643/1:         fstat64(3, 0xFFBFEA48)                          = 0
1643/1:         ioctl(3, TCGETA, 0xFFBFEB2C)                    Err#22 EINVAL
1643/1:         write(3, " I O R : 0 0 0 0 0 0 0 0".., 250)     = 250
1643/1:         close(3)                                        = 0
1643/1:         lwp_sigmask(SIG_SETMASK, 0xFFBFFEFF, 0x0000FFF7) = 0xFFBFFEFF
[0x0000FFFF]
1643/1:         lwp_sigmask(SIG_SETMASK, 0x00000000, 0x00000000) = 0xFFBFFEFF
[0x0000FFFF]
1643/3:         lwp_park(0x00000000, 0)                         = 0
1643/3:         lwp_sigmask(SIG_SETMASK, 0xFFBFFEFF, 0x0000FFF7) = 0xFFBFFEFF
[0x0000FFFF]
1643/3:         lwp_sigmask(SIG_SETMASK, 0x00000000, 0x00000000) = 0xFFBFFEFF
[0x0000FFFF]
1643/3:         pollsys(0xFDE7B9E8, 3, 0x00000000, 0x00000000)  = 1
1643/3:         read(2, " I O R : 0 0 0 0", 8)                  = 8
1643/3:         lwp_unpark(2)                                   = 0
1643/3:             Incurred fault #5, FLTACCESS  %pc = 0xFE5B8E2C
1643/3:               siginfo: SIGBUS BUS_ADRALN addr=0x494F5276
1643/3:             Received signal #10, SIGBUS [default]
1643/3:               siginfo: SIGBUS BUS_ADRALN addr=0x494F5276


Ok, and when running without truss it simply exits with retval 0, and i there is
no information about any problem, it just doesn't runs. Btw. Naming_Service
version 1.5.6 with -ORBDaemon param works without problems. Notification_Service
 from 1.5.2 with the same params, same environment work correctly.
Comment 5 Iliyan Jeliazkov 2007-02-19 10:30:23 CST
Sam, thanks a lot for the additional information. This would help reproducing
the problem. Can I also ask you to add ACE_DEBUG=1 to your commadline and post
the resulting output (run it directly, without truss)?

Thanks,

--Iliyan
Comment 6 Steve Totten 2007-02-19 10:42:58 CST
janka.ace@mailnull.com (Sam) wrote:
> COMMAND:
>  LD_LIBRARY_PATH=/opt/TAO-1.5.6/lib /opt/TAO-1.5.6/bin/Notify_Service -IORoutput
> /tmp/nos.ior -DispatchingThreads 10 -ORBRunThreads 10 -AllocateTaskperProxy
> -AsynchUpdates -ORBEndpoint iiop://1.0@belladona-3 -ORBInitRef
> NameService=file:///tmp/ns.ior -ORBDebugLevel 10 -ORBLogFile=/tmp/logx -ORBDaemon

That's a lot of threads.  I don't know if you realize, but with
-AllocateTaskperProxy, the number of dispatching threads specified with
-DispatchingThreads will be multiplied on a per supplier proxy basis.
So, each new consumer that connects will get a new supplier proxy (unless
it connects to an existing supplier proxy, but that's not the usual usage
scenario), which will mean another 10 dispatching threads will be created.
If you had 20 consumers, that's 200 dispatching threads, plus the 10
ORBRunThreads.

Since you are capturing ORB debug output in a file, can you attach the
contents of that file?  It may provide some clues.

> It doesn't creates core dump,

Perhaps ulimit is not set to allow core dumps on your system for your
userid.  What does "ulimit -c" show?

> i see it make sigbus only when running via truss
> (solaris strace). Here is its output:
> 
> 3/1:         lwp_park(0x00000000, 0)                         = 0
> 1643/1:         lwp_sigmask(SIG_SETMASK, 0xFFBFFEFF, 0x0000FFF7) = 0xFFBFFEFF
> [0x0000FFFF]
> 1643/1:         lwp_sigmask(SIG_SETMASK, 0x00000000, 0x00000000) = 0xFFBFFEFF
> [0x0000FFFF]
> 1643/1:         lwp_sigmask(SIG_SETMASK, 0xFFBFFEFF, 0x0000FFF7) = 0xFFBFFEFF
> [0x0000FFFF]
> 1643/1:         lwp_sigmask(SIG_SETMASK, 0x00000000, 0x00000000) = 0xFFBFFEFF
> [0x0000FFFF]
> 1643/1:         fstat64(3, 0xFFBFEBA0)                          = 0
> 1643/1:         fstat64(3, 0xFFBFEA48)                          = 0
> 1643/1:         ioctl(3, TCGETA, 0xFFBFEB2C)                    Err#22 EINVAL
> 1643/1:         write(3, " I O R : 0 0 0 0 0 0 0 0".., 250)     = 250
> 1643/1:         close(3)                                        = 0
> 1643/1:         lwp_sigmask(SIG_SETMASK, 0xFFBFFEFF, 0x0000FFF7) = 0xFFBFFEFF
> [0x0000FFFF]
> 1643/1:         lwp_sigmask(SIG_SETMASK, 0x00000000, 0x00000000) = 0xFFBFFEFF
> [0x0000FFFF]
> 1643/3:         lwp_park(0x00000000, 0)                         = 0
> 1643/3:         lwp_sigmask(SIG_SETMASK, 0xFFBFFEFF, 0x0000FFF7) = 0xFFBFFEFF
> [0x0000FFFF]
> 1643/3:         lwp_sigmask(SIG_SETMASK, 0x00000000, 0x00000000) = 0xFFBFFEFF
> [0x0000FFFF]
> 1643/3:         pollsys(0xFDE7B9E8, 3, 0x00000000, 0x00000000)  = 1
> 1643/3:         read(2, " I O R : 0 0 0 0", 8)                  = 8
> 1643/3:         lwp_unpark(2)                                   = 0
> 1643/3:             Incurred fault #5, FLTACCESS  %pc = 0xFE5B8E2C
> 1643/3:               siginfo: SIGBUS BUS_ADRALN addr=0x494F5276
> 1643/3:             Received signal #10, SIGBUS [default]
> 1643/3:               siginfo: SIGBUS BUS_ADRALN addr=0x494F5276
> 
> 
> Ok, and when running without truss it simply exits with retval 0, and i there is
> no information about any problem, it just doesn't runs. Btw. Naming_Service
> version 1.5.6 with -ORBDaemon param works without problems. Notification_Service
>  from 1.5.2 with the same params, same environment work correctly.

I see you are using g++ 4.1.1 on Solaris (you didn't specify which
version of Solaris).  Is it possible the Notification Service is linked
with libraries built with either a different version of g++, or with
a different compiler (e.g., a Sun compiler)?  What does
"cd $TAO_ROOT/orbsvcs/Notify_Service; ldd Notify_Service" show?

Steve Totten
Comment 7 Steve Totten 2007-02-19 14:59:36 CST
Reassigning to Iliyan.
Comment 8 Sam 2007-02-20 04:49:30 CST
Sorry i've not pasted whole uname: HW is Solaris 10, latest patches.
All used software is self compiled, including gcc (bootstraped) and TAO, linking
using Sun's solaris linker bundled with sol10 (because gnu binutils works
horibly on solaris). All C++ libraries used by Notification are build with the
same version of g++ (gcc and SUN compiler have different mangling schemes, so
it's quite impossible to mix them, and SUN compiler had ago some problems
compiling TAO when i've tried it). But here is ldd:

ldd of Notifcation_Server:
LD_LIBRARY_PATH=/opt/TAO-1.5.6/lib ldd /opt/TAO-1.5.6/bin/Notify_Service
        libTAO_IORTable.so.1 =>  /opt/TAO-1.5.6/lib/libTAO_IORTable.so.1
        libTAO_CosNaming.so.1 =>         /opt/TAO-1.5.6/lib/libTAO_CosNaming.so.1
        libTAO_CosNotification_Serv.so.1 =>     
/opt/TAO-1.5.6/lib/libTAO_CosNotification_Serv.so.1
        libTAO_ETCL.so.1 =>      /opt/TAO-1.5.6/lib/libTAO_ETCL.so.1
        libTAO_DynamicAny.so.1 =>        /opt/TAO-1.5.6/lib/libTAO_DynamicAny.so.1
        libTAO_CosNotification_Skel.so.1 =>     
/opt/TAO-1.5.6/lib/libTAO_CosNotification_Skel.so.1
        libTAO_Svc_Utils.so.1 =>         /opt/TAO-1.5.6/lib/libTAO_Svc_Utils.so.1
        libTAO_Valuetype.so.1 =>         /opt/TAO-1.5.6/lib/libTAO_Valuetype.so.1
        libTAO_CosEvent_Skel.so.1 =>    
/opt/TAO-1.5.6/lib/libTAO_CosEvent_Skel.so.1
        libTAO_CosNotification.so.1 =>  
/opt/TAO-1.5.6/lib/libTAO_CosNotification.so.1
        libTAO_CosEvent.so.1 =>  /opt/TAO-1.5.6/lib/libTAO_CosEvent.so.1
        libTAO_PortableServer.so.1 =>   
/opt/TAO-1.5.6/lib/libTAO_PortableServer.so.1
        libTAO_AnyTypeCode.so.1 =>       /opt/TAO-1.5.6/lib/libTAO_AnyTypeCode.so.1
        libTAO.so.1 =>   /opt/TAO-1.5.6/lib/libTAO.so.1
        libACE-5.5.6.so =>       /opt/TAO-1.5.6/lib/libACE-5.5.6.so
        librt.so.1 =>    /lib/librt.so.1
        libgen.so.1 =>   /lib/libgen.so.1
        libnsl.so.1 =>   /lib/libnsl.so.1
        libsocket.so.1 =>        /lib/libsocket.so.1
        libstdc++.so.6 =>        /opt/TAO-1.5.6/lib/libstdc++.so.6
        libm.so.2 =>     /lib/libm.so.2
        libc.so.1 =>     /lib/libc.so.1
        libgcc_s.so.1 =>         /opt/devtools/lib/libgcc_s.so.1
        libaio.so.1 =>   /lib/libaio.so.1
        libmd5.so.1 =>   /lib/libmd5.so.1
        libmp.so.2 =>    /lib/libmp.so.2
        libscf.so.1 =>   /lib/libscf.so.1
        libdoor.so.1 =>  /lib/libdoor.so.1
        libuutil.so.1 =>         /lib/libuutil.so.1
        /platform/SUNW,Sun-Fire-V240/lib/libc_psr.so.1
        /platform/SUNW,Sun-Fire-V240/lib/libmd5_psr.so.1



I could finaly produce core-dump (don't know why it didn't worked before),
backtrace:

#0  0xfe0b0bb8 in strlen () from /lib/libc.so.1
(gdb) bt
#0  0xfe0b0bb8 in strlen () from /lib/libc.so.1
#1  0xfe116138 in _ndoprnt () from /lib/libc.so.1
#2  0xfe118858 in vsnprintf () from /lib/libc.so.1
#3  0xfe5d8524 in ACE_OS::snprintf (buf=0x2e0f9 "", maxlen=4056,
format=0xffbfef11 "%s: %s") at ../../ace/OS_NS_stdio.inl:927
#4  0xfe5c6d50 in ACE_Log_Msg::log (this=0x2e0f9, format_str=0xfd8 <Address
0xfd8 out of bounds>, log_priority=4290768657, argp=0xd)
    at ../../ace/Log_Msg.cpp:1274
#5  0xfe5c7334 in ACE_Log_Msg::log (this=0x2dc78, log_priority=94511,
format_str=0x80 <Address 0x80 out of bounds>)
    at ../../ace/Log_Msg.cpp:951
#6  0x00013fc8 in TAO_Notify_Service_Driver::parse_args ()
#7  0x000147ec in TAO_Notify_Service_Driver::init ()
#8  0x00013b7c in main ()

ORBLogFile has not any contents, it is not even created.

ACE_DEBUG=1 output:

// (14372|0) SG::insert repo=291b8 (opened=0) - enqueue TAO_CosNotify_Service,
active=0.
// (14372|0) SG::insert repo=291b8 (opened=0) - enqueue
Notify_Default_Event_Manager_Objects_Factory, active=0.
ACE (14372|1) SG::process_directive_i, repo=291b8 -
Concrete_Valuetype_Adapter_Factory, dll=<null>, force=0
ACE (14372|1) SR::insert, repo=291b8 [34] (1024),
name=Concrete_Valuetype_Adapter_Factory, type=2fa10, impl=2f9b8, object=0,
active=-32046948
ACE (14372|1) SG::add_processed_statisc_svc, repo=291b8 -
Concrete_Valuetype_Adapter_Factory
ACE (14372|1) DSB::instance, repo=291b8, name=TAO_ORB_Core_Static_Resources
type=2e588 => 2f568
// (14372|0) SG::insert repo=291b8 (opened=0) - enqueue TAO_CosNotify_Service,
active=0.
// (14372|0) SG::insert repo=291b8 (opened=0) - enqueue
Notify_Default_Event_Manager_Objects_Factory, active=0.
// (14372|0) SG::insert repo=291b8 (opened=0) - enqueue TAO_CosNotify_Service,
active=0.
// (14372|0) SG::insert repo=291b8 (opened=0) - enqueue
Notify_Default_Event_Manager_Objects_Factory, active=0.
Segmentation Fault (core dumped)
Comment 9 Johnny Willemsen 2007-02-20 05:56:20 CST
looks a problem using '%p' in ACE LogMsg. Seems a nill string is used which can
cause a crash on solaris. Maybe use the same logic as with '%s' in log_msg, if
the string is 0, then print (null). This then seems more a problem in ACE
Comment 10 Chad Elliott 2007-04-04 08:04:49 CDT
Wed Apr  4 13:04:25 UTC 2007  Chad Elliott  <elliott_c@ociweb.com>

        * orbsvcs/Notify_Service/Notify_Service.cpp:
                                                    
          Added the missing parameter for %p in an ACE_ERROR_RETURN().

Comment 11 Sam 2007-08-02 07:27:27 CDT
Hi, got same problem with latest TAO 1.5.10, same scenario, same environment, only TAO upgraded. Again without -ORBDaemon Notify service works just fine.

Command line:

/opt/TAO/bin/Notify_Service -IORoutput /tmp/nos.ior -ORBRunThreads 10 -AllocateTaskperProxy -AsynchUpdates -ORBDebugLevel 10 -ORBLogFile /tmp/nos.log -ORBEndpoint iiop://1.0@belladona-3 -ORBInitRef NameService=file:///tmp/ns.ior -ORBDaemon


ORBDebugLog:

TAO (24812|1) Default_Resource_Factory - codeset manager=3bbb8
TAO (24812|1) - Codeset_Manager_i::init_ccs, Loaded Codeset translator <UTF8_Latin1_Factory>, ncs = 00010001 tcs = 05010001
TAO (24812|1) - UTF16_BOM_Translator: forceBE 0
TAO (24812|1) Loaded default protocol <IIOP_Factory>
TAO (24812|1) created new ORB <>
TAO (24812|1) - IIOP_Acceptor::open, address==belladona-3, options=(null)
TAO (24812|1) - IIOP_Acceptor::open, specified host=belladona-3:0
TAO (24812|1) - IIOP_Acceptor::open_i, listening on: <belladona-3:41779>
Loading the Cos Notification Service...
in TAO_Properties ctos 3e374
Running 10 ORB threads

Starting up the Notification Service...
object:499e0  created
object:497e8  created
Created POA : 1
(24812|1) Topology persistence disabled.
Activating object with id = 1 in  POA : 1
object:499e0 incr refcount = 1
(24812|1) TAO_Notify_EventChannelFactory::activate_selfTAO (24812|1) - Transport_Cache_Manager::find_i, no idle transport is available
TAO (24812|1) - Transport_Cache_Manager::fill_set_i, current_size = 0, cache_maximum = 128
TAO (24812|1) - IIOP_Connector::begin_connection, to <belladona-3:40940> which should block
TAO (24812|1) - IIOP_Connection_Handler::open, The local addr is <192.168.0.103:41780>
TAO (24812|1) - IIOP_Connection_Handler::open, IIOP connection to peer <192.168.0.103:40940> on 7
TAO (24812|1) - IIOP_Connector::make_connection, new connected connection to <belladona-3:40940> on Transport[7]
TAO (24812|1) - Transport_Cache_Manager::bind_i, 0xffbfedc4 -> 0x6ca58 Transport[7]
TAO (24812|1) - Transport_Cache_Manager::bind_i, cache size is [1]
TAO (24812|1) - Transport[7]::register_handler
TAO (24812|1) - Transport_Connector::connect, opening Transport[7] in TAO_CLIENT_ROLE
TAO (24812|1) - Codeset_Manager_i::set_tcs, No codeset component in profile
TAO (24812|1) - Muxed_TMS[7]::request_id, <1>
TAO (24812|1) - Codeset_Manager_i::generate_service_context, using tcs_c = 00010001, tcs_w = 00010109
TAO (24812|1) - GIOP_Message_Base::dump_msg, send GIOP v1.0 msg, 142 data bytes, my endian, Type Request[1]
GIOP message - HEXDUMP 154 bytes
47 49 4f 50 01 00 00 00  00 00 00 8e 00 00 00 01   GIOP............
00 00 00 01 00 00 00 0c  00 00 00 00 00 01 00 01   ................
00 01 01 09 00 00 00 01  01 00 00 00 00 00 00 3a   ...............:
14 01 0f 00 4e 55 50 00  00 00 1c 00 00 00 00 00   ....NUP.........
00 00 01 52 6f 6f 74 50  4f 41 00 4e 61 6d 65 53   ...RootPOA.NameS
65 72 76 69 63 65 00 00  00 00 00 00 00 00 01 4e   ervice.........N
61 6d 65 53 65 72 76 69  63 65 00 00 00 00 00 08   ameService......
74 6f 5f 6e 61 6d 65 00  00 00 00 00 00 00 00 1a   to_name.........
4e 6f 74 69 66 79 45 76  65 6e 74 43 68 61 6e 6e   NotifyEventChann
65 6c 46 61 63 74 6f 72  79 00                     elFactory.
TAO (24812|1) - Transport[7]::cleanup_queue, byte_count = 154
TAO (24812|1) - Transport[7]::cleanup_queue, after transfer, bc = 0, all_sent = 1, ml = 0
TAO (24812|1) - Transport[7]::drain_queue_helper, byte_count = 154, head_is_empty = 1
TAO (24812|1) - Transport[7]::drain_queue_i, helper retval = 1
TAO (24812|1) - Transport[7]::make_idle
TAO (24812|1) - Leader_Follower[7]::wait_for_event, (leader) enter reactor event loop
TAO (24812|1) - Connection_Handler[7]::handle_input, handle = 7/7
TAO (24812|1) - Transport[7]::handle_input
TAO (24812|1) - Transport[7]::process_queue_head, 0 enqueued
TAO (24812|1) - Transport[7]::handle_input_parse_data, enter
TAO (24812|1) - Transport[7]::handle_input_parse_data, read 65 bytes
TAO (24812|1) - GIOP_Message_State::parse_message_header_i
TAO (24812|1) - GIOP_Message_State::get_version_info
TAO (24812|1) - GIOP_Message_State::get_byte_order_info
TAO (24812|1) - Transport[7]::process_parsed_messages, entering (missing data == 0)
TAO (24812|1) - GIOP_Message_Base::dump_msg, recv GIOP v1.0 msg, 53 data bytes, my endian, Type Reply[1]
GIOP message - HEXDUMP 65 bytes
47 49 4f 50 01 00 00 01  00 00 00 35 00 00 00 00   GIOP.......5....
00 00 00 01 00 00 00 00  00 00 00 01 00 00 00 1a   ................
4e 6f 74 69 66 79 45 76  65 6e 74 43 68 61 6e 6e   NotifyEventChann
65 6c 46 61 63 74 6f 72  79 00 2f 48 00 00 00 01   elFactory./H....
00                                                 .
TAO (24812|1) - TAO_Muxed_TMS::dispatch_reply, id = 1
TAO (24812|1) - Connection_Handler[7]::handle_input, handle = 7/7, retval = 0
TAO (24812|1) - Leader_Follower[7]::wait_for_event, (leader) exit reactor event loop
TAO (24812|1) - Transport_Cache_Manager::is_entry_idle, state is [0]
TAO (24812|1) - Transport_Cache_Manager::find_i, at index 0 (Transport[7]) - idle
TAO (24812|1) - Transport_Connector::connect, got an existing connected Transport[7] in role TAO_CLIENT_ROLE
TAO (24812|1) - Muxed_TMS[7]::request_id, <2>
TAO (24812|1) - GIOP_Message_Base::dump_msg, send GIOP v1.0 msg, 255 data bytes, my endian, Type Request[2]
GIOP message - HEXDUMP 267 bytes
47 49 4f 50 01 00 00 00  00 00 00 ff 00 00 00 00   GIOP............
00 00 00 02 01 00 00 0c  00 00 00 3a 14 01 0f 00   ...........:....
4e 55 50 00 00 00 1c 00  00 00 00 00 00 00 01 52   NUP............R
6f 6f 74 50 4f 41 00 4e  61 6d 65 53 65 72 76 69   ootPOA.NameServi
63 65 00 00 00 00 00 00  00 00 01 4e 61 6d 65 53   ce.........NameS
65 72 76 69 63 65 00 00  00 00 00 07 72 65 62 69   ervice......rebi
6e 64 00 53 00 00 00 00  00 00 00 01 00 00 00 1a   nd.S............
4e 6f 74 69 66 79 45 76  65 6e 74 43 68 61 6e 6e   NotifyEventChann
65 6c 46 61 63 74 6f 72  79 00 74 43 00 00 00 01   elFactory.tC....
00 6c 46 61 00 00 00 26  49 44 4c 3a 4e 6f 74 69   .lFa...&IDL:Noti
66 79 45 78 74 2f 45 76  65 6e 74 43 68 61 6e 6e   fyExt/EventChann
65 6c 46 61 63 74 6f 72  79 3a 31 2e 30 00 00 00   elFactory:1.0...
00 00 00 01 00 00 00 00  00 00 00 3f 00 01 00 00   ...........?....
00 00 00 0c 62 65 6c 6c  61 64 6f 6e 61 2d 33 00   ....belladona-3.
a3 33 00 00 00 00 00 23  14 01 0f 00 4e 55 54 46   .3.....#....NUTF
b1 cb d2 00 0b ba 4f 00  00 00 01 00 00 00 01 00   ......O.........
00 00 00 00 00 00 01 00  00 00 01                  ...........
TAO (24812|1) - Transport[7]::cleanup_queue, byte_count = 267
TAO (24812|1) - Transport[7]::cleanup_queue, after transfer, bc = 0, all_sent = 1, ml = 0
TAO (24812|1) - Transport[7]::drain_queue_helper, byte_count = 267, head_is_empty = 1
TAO (24812|1) - Transport[7]::drain_queue_i, helper retval = 1
TAO (24812|1) - Transport[7]::make_idle
TAO (24812|1) - Leader_Follower[7]::wait_for_event, (leader) enter reactor event loop
TAO (24812|1) - Connection_Handler[7]::handle_input, handle = 7/7
TAO (24812|1) - Transport[7]::handle_input
TAO (24812|1) - Transport[7]::process_queue_head, 0 enqueued
TAO (24812|1) - Transport[7]::handle_input_parse_data, enter
TAO (24812|1) - Transport[7]::handle_input_parse_data, read 24 bytes
TAO (24812|1) - GIOP_Message_State::parse_message_header_i
TAO (24812|1) - GIOP_Message_State::get_version_info
TAO (24812|1) - GIOP_Message_State::get_byte_order_info
TAO (24812|1) - Transport[7]::process_parsed_messages, entering (missing data == 0)
TAO (24812|1) - GIOP_Message_Base::dump_msg, recv GIOP v1.0 msg, 12 data bytes, my endian, Type Reply[2]
GIOP message - HEXDUMP 24 bytes
47 49 4f 50 01 00 00 01  00 00 00 0c 00 00 00 00   GIOP............
00 00 00 02 00 00 00 00                            ........
TAO (24812|1) - TAO_Muxed_TMS::dispatch_reply, id = 2
TAO (24812|1) - Connection_Handler[7]::handle_input, handle = 7/7, retval = 0
TAO (24812|1) - Leader_Follower[7]::wait_for_event, (leader) exit reactor event loop
Registered with the naming service as: NotifyEventChannelFactory
../../../../TAO/orbsvcs/Notify_Service/Notify_Service.cpp: Running the Notification Service
TAO (24812|2) - ORB_Core::run, start [run]
TAO (24812|2) - ORB_Core::run, calling handle_events()
TAO (24812|3) - ORB_Core::run, start [run]
TAO (24812|3) - ORB_Core::run, calling handle_events()
TAO (24812|4) - ORB_Core::run, start [run]
TAO (24812|4) - ORB_Core::run, calling handle_events()
TAO (24812|5) - ORB_Core::run, start [run]
TAO (24812|5) - ORB_Core::run, calling handle_events()
TAO (24812|6) - ORB_Core::run, start [run]
TAO (24812|6) - ORB_Core::run, calling handle_events()
TAO (24812|7) - ORB_Core::run, start [run]
TAO (24812|7) - ORB_Core::run, calling handle_events()
TAO (24812|8) - ORB_Core::run, start [run]
TAO (24812|8) - ORB_Core::run, calling handle_events()
TAO (24812|9) - ORB_Core::run, start [run]
TAO (24812|9) - ORB_Core::run, calling handle_events()
TAO (24812|10) - ORB_Core::run, start [run]
TAO (24812|10) - ORB_Core::run, calling handle_events()
TAO (24812|11) - ORB_Core::run, start [run]
TAO (24812|11) - ORB_Core::run, calling handle_events()
TAO (24812|2) - Transport_Cache_Manager::fill_set_i, current_size = 1, cache_maximum = 128
TAO (24812|2) - Concurrency_Strategy::activate_svc_handler, opened as TAO_SERVER_ROLE
TAO (24812|2) - IIOP_Connection_Handler::open, The local addr is <192.168.0.103:41779>
TAO (24812|2) - IIOP_Connection_Handler::open, IIOP connection to peer <192.168.0.103:41788> on 3
TAO (24812|2) - Transport_Cache_Manager::bind_i, 0xfdffb854 -> 0x92628 Transport[3]
TAO (24812|2) - Transport_Cache_Manager::bind_i, cache size is [2]
TAO (24812|2) - Transport[3]::register_handler
sleep_hook failed: Broken pipe


TRACE:

26157/2:        lwp_sigmask(SIG_SETMASK, 0xFFBFFEFF, 0x0000FFF7) = 0xFFBFFEFF [0x0000FFFF]
26157/2:        write(2, " T A O   ( 2 6 1 5 7 | 2".., 84)      = 84
26157/2:        lwp_sigmask(SIG_SETMASK, 0x00000000, 0x00000000) = 0xFFBFFEFF [0x0000FFFF]
26157/2:        door_info(5, 0xFDFF9018)                        = 0
26157/2:        door_call(5, 0xFDFF9000)                        = 0
26157/2:        lwp_sigmask(SIG_SETMASK, 0xFFBFFEFF, 0x0000FFF7) = 0xFFBFFEFF [0x0000FFFF]
26157/2:        write(2, " T A O   ( 2 6 1 5 7 | 2".., 67)      = 67
26157/2:        lwp_sigmask(SIG_SETMASK, 0x00000000, 0x00000000) = 0xFFBFFEFF [0x0000FFFF]
26157/2:        lwp_sigmask(SIG_SETMASK, 0xFFBFFEFF, 0x0000FFF7) = 0xFFBFFEFF [0x0000FFFF]
26157/2:        write(2, " T A O   ( 2 6 1 5 7 | 2".., 47)      = 47
26157/2:        lwp_sigmask(SIG_SETMASK, 0x00000000, 0x00000000) = 0xFFBFFEFF [0x0000FFFF]
26157/2:        write(4, "\0\0\0\0\0\0\004", 8)                 Err#32 EPIPE
26157/2:            Received signal #13, SIGPIPE [ignored]
26157/2:              siginfo: SIG#0
26157/2:        open("/opt/nms/nls/C/SUNW_OST_OSLIB", O_RDONLY) Err#2 ENOENT
26157/2:        open("/usr/lib/locale/C/LC_MESSAGES/SUNW_OST_OSLIB.mo", O_RDONLY) Err#2 ENOENT
26157/2:        lwp_sigmask(SIG_SETMASK, 0xFFBFFEFF, 0x0000FFF7) = 0xFFBFFEFF [0x0000FFFF]
26157/2:        write(2, " s l e e p _ h o o k   f".., 31)      = 31
26157/2:        lwp_sigmask(SIG_SETMASK, 0x00000000, 0x00000000) = 0xFFBFFEFF [0x0000FFFF]
26157/11:       lwp_park(0x00000000, 0)                         = 0
26157/11:       lwp_sigmask(SIG_SETMASK, 0xFFBFFEFF, 0x0000FFF7) = 0xFFBFFEFF [0x0000FFFF]
26157/11:       lwp_sigmask(SIG_SETMASK, 0x00000000, 0x00000000) = 0xFFBFFEFF [0x0000FFFF]
26157/11:       pollsys(0xFD6FB9E0, 1, 0x00000000, 0x00000000) (sleeping...)
26157/2:        lwp_park(0x00000000, 0)         (sleeping...)
Comment 12 Sam 2007-08-02 07:29:29 CDT
Forgot to TAO version change.
Comment 13 Iliyan Jeliazkov 2007-08-02 21:24:17 CDT
(In reply to comment #11)

Hi Sam,

> /opt/TAO/bin/Notify_Service -IORoutput /tmp/nos.ior -ORBRunThreads 10
> -AllocateTaskperProxy -AsynchUpdates -ORBDebugLevel 10 -ORBLogFile /tmp/nos.log
> -ORBEndpoint iiop://1.0@belladona-3 -ORBInitRef NameService=file:///tmp/ns.ior
> -ORBDaemon


Something doesn't make sense to me. If this is (part of) the log file:

> ORBDebugLog:
> 
> ...
> (24812|1) Topology persistence disabled.
> Activating object with id = 1 in  POA : 1
> object:499e0 incr refcount = 1
> (24812|1) TAO_Notify_EventChannelFactory::activate_selfTAO (24812|1) -
...
> bytes, my endian, Type Request[1]
> GIOP message - HEXDUMP 154 bytes
> 47 49 4f 50 01 00 00 00  00 00 00 8e 00 00 00 01   GIOP............
> 00 00 00 01 00 00 00 0c  00 00 00 00 00 01 00 01   ................
> 00 01 01 09 00 00 00 01  01 00 00 00 00 00 00 3a   ...............:
> 14 01 0f 00 4e 55 50 00  00 00 1c 00 00 00 00 00   ....NUP.........
> 00 00 01 52 6f 6f 74 50  4f 41 00 4e 61 6d 65 53   ...RootPOA.NameS
> 65 72 76 69 63 65 00 00  00 00 00 00 00 00 01 4e   ervice.........N
> 61 6d 65 53 65 72 76 69  63 65 00 00 00 00 00 08   ameService......
> 74 6f 5f 6e 61 6d 65 00  00 00 00 00 00 00 00 1a   to_name.........
> 4e 6f 74 69 66 79 45 76  65 6e 74 43 68 61 6e 6e   NotifyEventChann
> 65 6c 46 61 63 74 6f 72  79 00                     elFactory.
> TAO (24812|1) - Transport[7]::cleanup_queue, byte_count = 154
> TAO (24812|1) - Transport[7]::cleanup_queue, after transfer, bc = 0, all_sent =
> 1, ml = 0
> TAO (24812|1) - Transport[7]::drain_queue_helper, byte_count = 154,
> head_is_empty = 1
...
> my endian, Type Reply[1]
> GIOP message - HEXDUMP 65 bytes
> 47 49 4f 50 01 00 00 01  00 00 00 35 00 00 00 00   GIOP.......5....
> 00 00 00 01 00 00 00 00  00 00 00 01 00 00 00 1a   ................
> 4e 6f 74 69 66 79 45 76  65 6e 74 43 68 61 6e 6e   NotifyEventChann
> 65 6c 46 61 63 74 6f 72  79 00 2f 48 00 00 00 01   elFactory./H....
> 00                                                 .
> TAO (24812|1) - TAO_Muxed_TMS::dispatch_reply, id = 1
> TAO (24812|1) - Connection_Handler[7]::handle_input, handle = 7/7, retval = 0
> TAO (24812|1) - Leader_Follower[7]::wait_for_event, (leader) exit reactor event
> loop
> TAO (24812|1) - Transport_Cache_Manager::is_entry_idle, state is [0]
> TAO (24812|1) - Transport_Cache_Manager::find_i, at index 0 (Transport[7]) -
> idle
> TAO (24812|1) - Transport_Connector::connect, got an existing connected
> Transport[7] in role TAO_CLIENT_ROLE
> TAO (24812|1) - Muxed_TMS[7]::request_id, <2>
> TAO (24812|1) - GIOP_Message_Base::dump_msg, send GIOP v1.0 msg, 255 data
> bytes, my endian, Type Request[2]
> GIOP message - HEXDUMP 267 bytes
> 47 49 4f 50 01 00 00 00  00 00 00 ff 00 00 00 00   GIOP............
> 00 00 00 02 01 00 00 0c  00 00 00 3a 14 01 0f 00   ...........:....
> 4e 55 50 00 00 00 1c 00  00 00 00 00 00 00 01 52   NUP............R
> 6f 6f 74 50 4f 41 00 4e  61 6d 65 53 65 72 76 69   ootPOA.NameServi
> 63 65 00 00 00 00 00 00  00 00 01 4e 61 6d 65 53   ce.........NameS
> 65 72 76 69 63 65 00 00  00 00 00 07 72 65 62 69   ervice......rebi
> 6e 64 00 53 00 00 00 00  00 00 00 01 00 00 00 1a   nd.S............
> 4e 6f 74 69 66 79 45 76  65 6e 74 43 68 61 6e 6e   NotifyEventChann
> 65 6c 46 61 63 74 6f 72  79 00 74 43 00 00 00 01   elFactory.tC....
> 00 6c 46 61 00 00 00 26  49 44 4c 3a 4e 6f 74 69   .lFa...&IDL:Noti
> 66 79 45 78 74 2f 45 76  65 6e 74 43 68 61 6e 6e   fyExt/EventChann
> 65 6c 46 61 63 74 6f 72  79 3a 31 2e 30 00 00 00   elFactory:1.0...
> 00 00 00 01 00 00 00 00  00 00 00 3f 00 01 00 00   ...........?....
> 00 00 00 0c 62 65 6c 6c  61 64 6f 6e 61 2d 33 00   ....belladona-3.
> a3 33 00 00 00 00 00 23  14 01 0f 00 4e 55 54 46   .3.....#....NUTF
> b1 cb d2 00 0b ba 4f 00  00 00 01 00 00 00 01 00   ......O.........
> 00 00 00 00 00 00 01 00  00 00 01                  ...........
> TAO (24812|1) - Transport[7]::cleanup_queue, byte_count = 267
> TAO (24812|1) - Transport[7]::cleanup_queue, after transfer, bc = 0, all_sent =
> 1, ml = 0
> TAO (24812|1) - Transport[7]::drain_queue_helper, byte_count = 267,
> head_is_empty = 1
...
> my endian, Type Reply[2]
> GIOP message - HEXDUMP 24 bytes
> 47 49 4f 50 01 00 00 01  00 00 00 0c 00 00 00 00   GIOP............
> 00 00 00 02 00 00 00 00                            ........
> TAO (24812|1) - TAO_Muxed_TMS::dispatch_reply, id = 2
> TAO (24812|1) - Connection_Handler[7]::handle_input, handle = 7/7, retval = 0
> TAO (24812|1) - Leader_Follower[7]::wait_for_event, (leader) exit reactor event
> loop
> Registered with the naming service as: NotifyEventChannelFactory
> ../../../../TAO/orbsvcs/Notify_Service/Notify_Service.cpp: Running the
> Notification Service
> ...
> TAO (24812|2) - Transport_Cache_Manager::fill_set_i, current_size = 1,
> cache_maximum = 128
> TAO (24812|2) - Concurrency_Strategy::activate_svc_handler, opened as
> TAO_SERVER_ROLE
> TAO (24812|2) - IIOP_Connection_Handler::open, The local addr is
> <192.168.0.103:41779>
> TAO (24812|2) - IIOP_Connection_Handler::open, IIOP connection to peer
> <192.168.0.103:41788> on 3
> TAO (24812|2) - Transport_Cache_Manager::bind_i, 0xfdffb854 -> 0x92628
> Transport[3]
> TAO (24812|2) - Transport_Cache_Manager::bind_i, cache size is [2]
> TAO (24812|2) - Transport[3]::register_handler
> sleep_hook failed: Broken pipe

I see the process id was 24812, hoever below the truss output shows 26157?!


> 26157/2:        lwp_sigmask(SIG_SETMASK, 0xFFBFFEFF, 0x0000FFF7) = 0xFFBFFEFF
> [0x0000FFFF]
> 26157/2:        write(2, " T A O   ( 2 6 1 5 7 | 2".., 84)      = 84
> 26157/2:        lwp_sigmask(SIG_SETMASK, 0x00000000, 0x00000000) = 0xFFBFFEFF
> ...


Looks like you are running a (TAO) client under truss, not the Notification Service. And what we see failing is that client process. Is this correct?

If, so - can you post the full command line for that client? Also, please post the stack trace at the point where the client causes the problem.

What happens to the NS process at that time? Does it go down, or remain running? Is there more to the NS server log _after_ the client dies?

Thanks,

--Iliyan Jeliazkov
Object Computing, Inc.
Comment 14 Sam 2007-08-03 04:10:37 CDT
Sorry, i've mixed log files when i was pasting them, but trace is from running Notify service. Logs are not from the same run, but Notify still fails in the same place.

This is how i was tracing Notify:
truss -f -o /tmp/nos.trace /opt/TAO/bin/Notify_Service -IORoutput /tmp/nos.ior -ORBRunThreads 10 -AllocateTaskperProxy -AsynchUpdates -ORBDebugLevel 10 -ORBLogFile /tmp/nos.log -ORBEndpoint iiop://1.0@belladona-3 -ORBInitRef NameService=file:///tmp/ns.ior -ORBDaemon

I'm going to setup again tao-1.5.10 testing environment on new machine so i can test it continuosly. As soon i'll have backtrace from debuger, i'll post it here.
Comment 15 Sam 2007-08-20 07:07:57 CDT
Finaly got a backtrace of the bug.

Core was generated by `/opt/TAO/bin/Notify_Service -IORoutput /tmp/nos.ior -ORBRunThreads 10 -Allocate'.
Program terminated with signal 10, Bus error.
#0  0xfde7ce0c in ACE_Event_Handler::Reference_Counting_Policy::value (this=0x494f5272) at ../../ace/Event_Handler.cpp:232
232     ../../ace/Event_Handler.cpp: No such file or directory.
        in ../../ace/Event_Handler.cpp
(gdb) bt
#0  0xfde7ce0c in ACE_Event_Handler::Reference_Counting_Policy::value (this=0x494f5272) at ../../ace/Event_Handler.cpp:232
#1  0xfdf0a6c8 in ACE_Select_Reactor_Notify::dispatch_notify (this=0x3e5b0, buffer=@0xfd7fb9b8) at ../../ace/Select_Reactor_Base.cpp:817
#2  0xfdf29da0 in ACE_TP_Reactor::handle_notify_events (this=0x3d6e8, guard=@0xfd7fbad4) at ../../ace/TP_Reactor.cpp:396
#3  0xfdf2a780 in ACE_TP_Reactor::dispatch_i (this=0x3d6e8, max_wait_time=0x0, guard=@0xfd7fbad4) at ../../ace/TP_Reactor.cpp:238
#4  0xfdf2a8fc in ACE_TP_Reactor::handle_events (this=0x3d6e8, max_wait_time=0x0) at ../../ace/TP_Reactor.cpp:174
#5  0xfdeeeea8 in ACE_Reactor::handle_events (this=0x3dfc8, max_wait_time=0x0) at ../../ace/Reactor.cpp:420
#6  0xfe1b4b24 in TAO_ORB_Core::run (this=0x33ab0, tv=0x0, perform_work=0) at ../../../TAO/tao/ORB_Core.cpp:2177
#7  0xfe1ae19c in CORBA::ORB::run (this=0x3f610, tv=0x0) at ../../../TAO/tao/ORB.cpp:202
#8  0xfe1ae1e0 in CORBA::ORB::run (this=0x3f610) at ../../../TAO/tao/ORB.cpp:188
#9  0x00015368 in Worker::svc (this=0xffbff350) at ../../../../TAO/orbsvcs/Notify_Service/Notify_Service.cpp:451
#10 0xfdf2bc7c in ACE_Task_Base::svc_run (args=0xffbff350) at ../../ace/Task.cpp:271
#11 0xfdf2cce4 in ACE_Thread_Adapter::invoke_i (this=0x41558) at ../../ace/Thread_Adapter.cpp:146
#12 0xfdf2cdf8 in ACE_Thread_Adapter::invoke (this=0x41558) at ../../ace/Thread_Adapter.cpp:95
#13 0xfde5e174 in ace_thread_adapter (args=0x41558) at ../../ace/Base_Thread_Adapter.cpp:116
#14 0xfd9bfda4 in _lwp_start () from /lib/libc.so.1
#15 0xfd9bfda4 in _lwp_start () from /lib/libc.so.1
Backtrace stopped: previous frame identical to this frame (corrupt stack?)


Logfile strip:

ACE (13366|11) SG::ctor - this = 4d698, pss = 0
ACE (13366|11) Service_Config_Guard:<ctor> - repo=95498 superceded by repo=2e3c8
TAO (13366|11) - ORB_Core::run, start [run]
TAO (13366|11) - ORB_Core::run, calling handle_events()
TAO (13366|2) - Transport_Cache_Manager::fill_set_i, current_size = 1, cache_maximum = 128
TAO (13366|2) - Concurrency_Strategy::activate_svc_handler, opened as TAO_SERVER_ROLE
TAO (13366|2) - IIOP_Connection_Handler::open, The local addr is <192.168.0.102:41826>
TAO (13366|2) - IIOP_Connection_Handler::open, IIOP connection to peer <192.168.0.102:41836> on 3
TAO (13366|2) - Transport_Cache_Manager::bind_i, 0xfd7fb638 -> 0x96628 Transport[3]
TAO (13366|2) - Transport_Cache_Manager::bind_i, cache size is [2]
TAO (13366|2) - Transport[3]::register_handler
sleep_hook failed: Broken pipe
Comment 16 Johnny Willemsen 2008-03-31 05:46:06 CDT
lowering priority